-
Notifications
You must be signed in to change notification settings - Fork 71
Issues: vllm-project/llm-compressor
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Is is possible to perform unstructured sparsification, and then apply W8A8 quantization?
enhancement
New feature or request
#1051
opened Jan 10, 2025 by
zjnyly
When I run the code at examples/multimodal_vision/pixtral_example.py, an error occurs. KeyError: 'pixtral'
bug
Something isn't working
#1050
opened Jan 9, 2025 by
darrenzhang1007
get_state_dict_offloaded_model
is called twice when saving a model
bug
#1047
opened Jan 7, 2025 by
kylesayrs
Does llmcompressor support hybrid sparsity?
enhancement
New feature or request
#1037
opened Jan 6, 2025 by
jiangjiadi
quant method about kv cache
bug
Something isn't working
#1024
opened Jan 2, 2025 by
sitabulaixizawaluduo
2:4 sparsity, w8a8 supports embedding model?
enhancement
New feature or request
#1020
opened Dec 31, 2024 by
neiltian-tencent
Quantize glm-4v-9b with INT8 Quantization
bug
Something isn't working
#1003
opened Dec 20, 2024 by
citrix123
A tutorial doc for how to use the sparse prunings
documentation
Improvements or additions to documentation
#993
opened Dec 18, 2024 by
hafezmg48
The new version 0.3.0 takes a long time for quantization and eventually fails due to OOM
bug
Something isn't working
#965
opened Dec 10, 2024 by
okwinds
Error when quantizing LLama 3.3 70b to FP8
bug
Something isn't working
#963
opened Dec 6, 2024 by
Syst3m1cAn0maly
How to recover stage quantization from finetuning stage after an error
bug
Something isn't working
#957
opened Dec 5, 2024 by
jiangjiadi
About lora finetuning of 2:4 sparse and sparse quant models
enhancement
New feature or request
#952
opened Dec 4, 2024 by
arunpatala
Finetuning in 2:4 sparsity w4a16 example fails with multiple GPUs
bug
Something isn't working
#911
opened Nov 13, 2024 by
arunpatala
Is it possible to quantize to FP8 W8A16 without calibration data
enhancement
New feature or request
#858
opened Oct 21, 2024 by
us58
Perplexity (ppl) Calculation of Local Sparse Model: NaN issue
bug
Something isn't working
#853
opened Oct 19, 2024 by
HengJayWang
[USAGE] FP8 W8A8 (+KV) with LORA Adapters
enhancement
New feature or request
#164
opened Sep 11, 2024 by
paulliwog
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.