Skip to content

[LPT] FP8 KV-Cache Static Quantization #31557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

v-Golubev
Copy link
Contributor

@v-Golubev v-Golubev commented Aug 1, 2025

Details:

This PR adds KV-Cache static quantization support on LPT side. The main changes:

  • MoveFakeConvertUpThroughKVCacheConcat transformation: a prerequisite for KV cache quantization optimization. It identifies patterns where FakeConvert operations are applied to the output of KV cache concatenation and moves them to the individual concat inputs instead.
  • *KVCacheConcat transformation, which identifies KV concatenation patterns where both cache and new KV data undergo identical quantization (downconvert) followed by dequantization (upconvert). It optimizes the pattern by:
    1. Setting the cache variable precision to low precision (fp8);
    2. Removing downconvert subgraphs from the cache branch;
    3. Connecting the Assign operation directly to the low-precision concat output (so upconvert subgraph is also removed).*
  • KVCacheStaticQuantization test class, which covers all the new introduced transformations as a single pipeline.

Tickets:

@v-Golubev v-Golubev requested review from a team as code owners August 1, 2025 09:34
@v-Golubev v-Golubev requested review from itikhono and removed request for a team August 1, 2025 09:35
@github-actions github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: LP transformations OpenVINO Low Precision transformations labels Aug 1, 2025
Comment on lines +556 to +575
match_multi_query_bcst(const std::shared_ptr<ov::Node>& kv) {
using namespace ov::pass;
using namespace ov::pass::pattern;
using namespace ov::gen_pattern;

auto reshape_kv = wrap_type<ov::op::v1::Reshape>({kv, any_input()});
auto unsqueeze_kv = wrap_type<ov::op::v0::Unsqueeze>({kv, any_input()});

auto constant_bcst = wrap_type<ov::op::v0::Constant>(value_matches("1.0"));

auto computed_bcst = makePattern<ov::op::v1::Broadcast>(
{wrap_type<ov::op::v0::Constant>(value_matches("1.0")), any_input(), any_input()},
{{"mode", "numpy"}});

auto multiply_kv = wrap_type<ov::op::v1::Multiply>({reshape_kv | unsqueeze_kv, constant_bcst | computed_bcst});
auto computed_bcst3 = makePattern<ov::op::v3::Broadcast>({unsqueeze_kv, any_input()}, {{"mode", "bidirectional"}});

auto result = wrap_type<ov::op::v1::Reshape>({multiply_kv | computed_bcst3, any_input()});
return std::make_tuple(result, reshape_kv, unsqueeze_kv, computed_bcst, multiply_kv, computed_bcst3);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: rewrite after #31386 is merged into master

auto fc_cache = fake_convert->clone_with_new_inputs(form_new_fc_inputs(concat->input_value(0)));
fc_cache->set_friendly_name(fake_convert->get_friendly_name() + "_1");
auto fc_kv = fake_convert->clone_with_new_inputs(form_new_fc_inputs(concat->input_value(1)));
fc_cache->set_friendly_name(fake_convert->get_friendly_name() + "_2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean this?

Suggested change
fc_cache->set_friendly_name(fake_convert->get_friendly_name() + "_2");
fc_kv->set_friendly_name(fake_convert->get_friendly_name() + "_2");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: LP transformations OpenVINO Low Precision transformations category: transformations OpenVINO Runtime library - Transformations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants