- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.6k
feat: summarize when tokenlimit exceeded or threshold reached #3227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the  You can disable this status message by setting the  ✨ Finishing touches🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
| convert to draft as this PR hasn't updated with the approach we discussed | 
| Context creator now preserves the system message and returns the remaining history strictly by Token-limit handling captures the full context, rolls back recent tool-call chains when | 
| Thanks for the thorough PR @fengju0213. I wonder if the new implementation would cause inconsistencies and fragmentation of the summarization logic already implemented in two places. 
 summary_result = agent.summarize(filename="meeting_notes")
# Returns: {"summary": str, "file_path": str, "status": str}This is mostly used to create workflow.md files of a session 
 | 
| 
 thanks for reviewing! @hesamsheikh We can probably unify the naming after rolename later. For now, could you help review and summarize the current logic for handling the token limit? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fengju0213!! Left some comments. Maybe should also consider adding a test case when SUMMARY_MAX_DEPTH is reached.
b1885f3    to
    bc69be7      
    Compare
  
    | Hey @fengju0213 just as a reference that might be helpful, I ran Claude Code until it summarized the full context, and here is how it looks: I think there are helpful patterns here we can learn from. moreover, if you notice the tone of the summary (like "Continue with the last task...") you notice the role cannot be Assistant here. I think the best approach would be to append the summary to the system message and wipe the memory. | 
| 
 @hesamsheikh thanks for help suggestion!already made some improvements based on this It helps clearly distinguish summarizations from actual user messages. From reviewing CC’s summaries, it seems that they also include aggregated user details within assistant-role messages — so their summaries might be following a similar approach. Since this is a new and experimental feature, it would be best to evaluate its effectiveness later through real usage channels like Eigent. | 
| @hesamsheikh @MuggleJinx already updated,could help review again? | 
| I will come back with the review ASAP @fengju0213 | 
| 
 Thank you so much! | 
| 
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on the PR. I added one comment. just a few questions.
- 
I noticed you changed the summary format a bit, don't you think it's better we keep a list of the user messages in the summary as well? (maybe a truncated one). this could be added programatically without relying on the summarizer agent. it helps reduce reliance on LLM-provided summary. 
- 
the summary may happen in the middle of executing a task. shouldn't we keep the last task that is not finished in the summary? something like: 
 "- Current Task: What is currently being worked on (if any)\n"
 "- Pending Tasks: What still needs to be done\n"
 I tested this and it may be able to continue from the task it left off, but it relies on the LLM's understanding not explicit prompt instructions and guidelines.
- 
you add a user message to continue from the last task, i thought of an alternative approach which might be more reliable for continuity of the task and avoid the confusion of the agent: We can summarize all the messages before the last user message and add that last message after the context summary, this way the agent knows what the current task is and it is much easier to debug. 
| 
 thanks for suggestion,i created a new prompt for summary,we can adjust it in the feature for test!also change the role of "continue message" for better distinction | 
| Amazing @fengju0213. Everything looks good to me. I will open an issue after the PR is merged on some ideas and potential areas of improvement that can be tested by the devs. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Thanks @fengju0213
| @hesamsheikh @MuggleJinx thanks for review! | 
Description
Context creator now preserves the system message and returns the remaining history strictly by
timestamp, removing all per-message token bookkeeping.
necessary, swaps in an assistant summary, and records summary state (depth, new-record counts,
last user input). This state blocks back-to-back summaries with negligible progress and caps
retries at three, preventing summarize→retry loops—even when the immediate overflow comes from a
tool call.
should control limits through model backend configuration.
Pending work:
1.clean up the modules related to token_limit and the token counter.
2.Add unit tests that cover input strings capable of triggering the token-limit path.
Checklist
Go over all the following points, and put an
xin all the boxes that apply.Fixes #issue-numberin the PR description (required)pyproject.tomlanduv lockIf you are unsure about any of these, don't hesitate to ask. We are here to help!