Skip to content

[NPUW] Enabled model with multiple outputs in LLMInferRequest #31520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AsyaPronina
Copy link
Contributor

Details:

  • Added possibility for passed LLM model to have multiple outputs

Tickets:

@AsyaPronina AsyaPronina requested review from a team as code owners July 30, 2025 01:26
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jul 30, 2025
@AsyaPronina AsyaPronina changed the title Added possibility for passed model to have multiple outputs [NPUW] Added possibility for passed model to have multiple outputs Jul 30, 2025
@AsyaPronina AsyaPronina changed the title [NPUW] Added possibility for passed model to have multiple outputs [NPUW] Enabled model with multiple outputs in LLMInferRequest Jul 30, 2025
@AsyaPronina AsyaPronina force-pushed the fix_multi_outputs_issue branch from 28d3e37 to b5a0806 Compare July 30, 2025 01:37
@AsyaPronina AsyaPronina force-pushed the fix_multi_outputs_issue branch from b5a0806 to 11ab2cb Compare July 30, 2025 01:44
// second and third and combine them using XOR
// and bit shifting:

return ((hash<std::size_t>()(port.get_index()) ^ (hash<const ov::Node*>()(port.get_node()) << 1)) >> 1) ^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough just to have hash from port.get_node() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, as ov::Node can have multiple outputs (with different indices)

LOG_DEBUG("Input name " << input_name << " doesn't contain kv cache. Skipping.");
continue;
}
NPUW_ASSERT(m_kvcache_in_ports.find(input_name) == m_kvcache_in_ports.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert looks very suspicious

// model's I/O are appended to original model's I/O at the end,
// thus it is safe to loop over KVCache I/O blocks just using some
// start offsets.
std::size_t start_idx_in_outputs = 0u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we set this one to something different than 0?

Copy link
Contributor Author

@AsyaPronina AsyaPronina Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -531,9 +530,9 @@ void ov::npuw::LLMInferRequest::infer_prefill(ov::SoPtr<ov::ITensor> input_ids,
if (m_lm_head_request) {
LOG_DEBUG("Calling inference for LM head model.");
m_lm_head_request->infer();
m_logits = m_lm_head_request->get_tensor(m_lm_head_logits_port);
update_out_tensors_from(m_lm_head_request);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong. Only logits will be in LM head, other outputs should be gathered from prefill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants