Skip to content

enable feature score data collection in torchrec #3285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

emlin
Copy link
Contributor

@emlin emlin commented Aug 15, 2025

Summary:
Add enable_feature_score_weight_accumulation flag to ShardedEmbeddingCollection. When this flag is true, and dedup ec index is true, we'll accumulate kjt weight and count and reset back to kjt weight, to allow input dist to distribute feature score.

this change is part of ZCH v.Next feature score eviction story:

  • collect score for every feature id in model, e.g. for positive id set to 0.5, and negative id set to 0.2.
  • set score as the input id list feature kjt's weight value
  • in EC forward, if there is ID dedup, aggregate the id score and occurrence of each id.
  • distribute the id score in kjt weight
  • in KVZCH embedding kernel, call forward with weight as an optional parameter

in ZCH TBE backend (separate diffs):

  • set the feature score to ZCH TBE backend
  • run eviction based on the id score value

for the whole story, please reference here:
https://docs.google.com/document/d/1TJHKvO1m3-5tYAKZGhacXnGk7iCNAzz7wQlrFbX_LDI/edit?tab=t.0

Differential Revision: D79864431

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79864431

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79864431

emlin added a commit to emlin/torchrec that referenced this pull request Aug 15, 2025
Summary:
Pull Request resolved: pytorch#3285

Add enable_feature_score_weight_accumulation flag to ShardedEmbeddingCollection. When this flag is true, and dedup ec index is true, we'll accumulate kjt weight and count and reset back to kjt weight, to allow input dist to distribute feature score.

this change is part of ZCH v.Next feature score eviction story:
- collect score for every feature id in model, e.g. for positive id set to 0.5, and negative id set to 0.2.
- set score as the input id list feature kjt's weight value
- in EC forward, if there is ID dedup, aggregate the id score and occurrence of each id.
- distribute the id score in kjt weight
- in KVZCH embedding kernel, call forward with weight as an optional parameter

in ZCH TBE backend (separate diffs):
- set the feature score to ZCH TBE backend
- run eviction based on the id score value

for the whole story, please reference here:
https://docs.google.com/document/d/1TJHKvO1m3-5tYAKZGhacXnGk7iCNAzz7wQlrFbX_LDI/edit?tab=t.0

Differential Revision: D79864431
@emlin emlin force-pushed the export-D79864431 branch from 999a168 to 7cfb966 Compare August 15, 2025 01:13
Differential Revision: D79591336
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79864431

emlin added a commit to emlin/torchrec that referenced this pull request Aug 15, 2025
Summary:
Pull Request resolved: pytorch#3285

Add enable_feature_score_weight_accumulation flag to ShardedEmbeddingCollection. When this flag is true, and dedup ec index is true, we'll accumulate kjt weight and count and reset back to kjt weight, to allow input dist to distribute feature score.

this change is part of ZCH v.Next feature score eviction story:
- collect score for every feature id in model, e.g. for positive id set to 0.5, and negative id set to 0.2.
- set score as the input id list feature kjt's weight value
- in EC forward, if there is ID dedup, aggregate the id score and occurrence of each id.
- distribute the id score in kjt weight
- in KVZCH embedding kernel, call forward with weight as an optional parameter

in ZCH TBE backend (separate diffs):
- set the feature score to ZCH TBE backend
- run eviction based on the id score value

for the whole story, please reference here:
https://docs.google.com/document/d/1TJHKvO1m3-5tYAKZGhacXnGk7iCNAzz7wQlrFbX_LDI/edit?tab=t.0

Reviewed By: duduyi2013

Differential Revision: D79864431
@emlin emlin force-pushed the export-D79864431 branch from 7cfb966 to 599350c Compare August 15, 2025 04:47
Summary:
Pull Request resolved: pytorch#3285

Add enable_feature_score_weight_accumulation flag to ShardedEmbeddingCollection. When this flag is true, and dedup ec index is true, we'll accumulate kjt weight and count and reset back to kjt weight, to allow input dist to distribute feature score.

this change is part of ZCH v.Next feature score eviction story:
- collect score for every feature id in model, e.g. for positive id set to 0.5, and negative id set to 0.2.
- set score as the input id list feature kjt's weight value
- in EC forward, if there is ID dedup, aggregate the id score and occurrence of each id.
- distribute the id score in kjt weight
- in KVZCH embedding kernel, call forward with weight as an optional parameter

in ZCH TBE backend (separate diffs):
- set the feature score to ZCH TBE backend
- run eviction based on the id score value

for the whole story, please reference here:
https://docs.google.com/document/d/1TJHKvO1m3-5tYAKZGhacXnGk7iCNAzz7wQlrFbX_LDI/edit?tab=t.0

Reviewed By: duduyi2013

Differential Revision: D79864431
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79864431

@emlin emlin force-pushed the export-D79864431 branch from 599350c to 18d7617 Compare August 15, 2025 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants