You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
but the config.json after quantization has no messages about compressed-tensors.
Does llm-compressor support only quant kv cache now? If can, what shout I do?
The text was updated successfully, but these errors were encountered:
I thought llmcompressor could use the kv cache scale per token to achieve higher accuracy and avoid online quantization, but now the quantization result is per tensor
Describe the bug
How can I quant kv cache but not weight?
I tried it like this
recipe = """
quant_stage:
quant_modifiers:
QuantizationModifier:
kv_cache_scheme:
num_bits: 8
type: float
observer: "minmax"
strategy: token
dynamic: true
symmetric: true
"""
but the config.json after quantization has no messages about compressed-tensors.
Does llm-compressor support only quant kv cache now? If can, what shout I do?
The text was updated successfully, but these errors were encountered: