-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[V1] APC + prompt logprobs unsupported (PR 2/N for v1 sample and prompt logprobs support)
#11910
opened Jan 10, 2025 by
afeldman-nm
•
Draft
[FP8][Kernel] Dynamic kv cache scaling factors computation
documentation
Improvements or additions to documentation
#11906
opened Jan 9, 2025 by
gshtras
Loading…
[Doc] Show default pooling method in a table
documentation
Improvements or additions to documentation
#11904
opened Jan 9, 2025 by
DarkLight1337
Loading…
[VLM] Enable tokenized inputs for merged multi-modal processor
ready
ONLY add when PR is ready to merge/full CI is needed
#11900
opened Jan 9, 2025 by
DarkLight1337
Loading…
[Bugfix] support to run partially 2:4 model with CompressedTensors24 scheme
#11889
opened Jan 9, 2025 by
jiangjiadi
Loading…
Add
device
as parameter to TP and rotary_embedding functions
#11888
opened Jan 9, 2025 by
chunyuan-w
•
Draft
[optimization] remove python function call for custom activation op
#11885
opened Jan 9, 2025 by
cennn
Loading…
[CI] Add auto update workflow for Dockerfile graph
ci/build
#11879
opened Jan 9, 2025 by
WineChord
Loading…
Updating the high performance vllm docker for AMD Rocm.
documentation
Improvements or additions to documentation
#11877
opened Jan 9, 2025 by
haic0
Loading…
[Hardware][Gaudi] Support loading checkpoints quantized using Autofp8
#11869
opened Jan 9, 2025 by
zhenwei-intel
•
Draft
[WIP][Kernel] Update
cutlass_scaled_mm
to support 2d group (blockwise) scaling
ci/build
#11868
opened Jan 8, 2025 by
LucasWilkinson
•
Draft
3 tasks
[Spec Decode] Add Script for converting HF Eagle checkpoint to vLLM compatible checkpoint
documentation
Improvements or additions to documentation
#11866
opened Jan 8, 2025 by
sroy745
Loading…
[CI/Build] Add markdown linter
ci/build
documentation
Improvements or additions to documentation
#11857
opened Jan 8, 2025 by
rafvasq
Loading…
[Bugfix] Fix start_idx for computing slot mapping to avoid uninitiali…
#11851
opened Jan 8, 2025 by
ShawnD200
Loading…
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support
ci/build
#11844
opened Jan 8, 2025 by
sighingnow
Loading…
[Hardware][CPU] Support MOE models on x86 CPU
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
x86 CPU
#11831
opened Jan 8, 2025 by
bigPYJ1151
Loading…
[Frontend] Disaggregate prefill decode with zmq
frontend
#11791
opened Jan 7, 2025 by
panf2333
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.