quant method about kv cache #1024

sitabulaixizawaluduo · 2025-01-02T06:44:56Z

Describe the bug
How can I quant kv cache but not weight?

I tried it like this

recipe = """
quant_stage:
quant_modifiers:
QuantizationModifier:
kv_cache_scheme:
num_bits: 8
type: float
observer: "minmax"
strategy: token
dynamic: true
symmetric: true
"""

but the config.json after quantization has no messages about compressed-tensors.
Does llm-compressor support only quant kv cache now? If can, what shout I do?

Kha-Zix-1 · 2025-01-02T08:19:48Z

Maybe you can quantize kv cache when execute the vllm server? For example

python -m "vllm.entrypoints.openai.api_server" --model "{path}" --tensor-parallel-size {d} --gpu-memory-utilization 0.9 --served-model-name {model} --max-model-len 32768 --max-seq-len-to-capture 32768 --kv-cache-dtype fp8 --block-size 16 --port {port}

use --kv-cache-dtype fp8

sitabulaixizawaluduo · 2025-01-02T08:31:21Z

Maybe you can quantize kv cache when execute the vllm server? For example

python -m "vllm.entrypoints.openai.api_server" --model "{path}" --tensor-parallel-size {d} --gpu-memory-utilization 0.9 --served-model-name {model} --max-model-len 32768 --max-seq-len-to-capture 32768 --kv-cache-dtype fp8 --block-size 16 --port {port}

use --kv-cache-dtype fp8

I thought llmcompressor could use the kv cache scale per token to achieve higher accuracy and avoid online quantization, but now the quantization result is per tensor

dsikka · 2025-01-03T21:44:21Z

HI @sitabulaixizawaluduo
Can you share the exact code you ran to quantize your model?
Thanks

sitabulaixizawaluduo added the bug Something isn't working label Jan 2, 2025

dsikka self-assigned this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quant method about kv cache #1024

quant method about kv cache #1024

sitabulaixizawaluduo commented Jan 2, 2025

Kha-Zix-1 commented Jan 2, 2025

sitabulaixizawaluduo commented Jan 2, 2025

dsikka commented Jan 3, 2025

quant method about kv cache #1024

quant method about kv cache #1024

Comments

sitabulaixizawaluduo commented Jan 2, 2025

Kha-Zix-1 commented Jan 2, 2025

sitabulaixizawaluduo commented Jan 2, 2025

dsikka commented Jan 3, 2025