-
Couldn't load subscription status.
- Fork 86
Support discarding the CST to preserve memory #1733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mark,
I just did a quick 'scan' of the code and asked questions.
When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?
Daily reminder (and note to myself): It would be easier to have code formatting in a separate commit ;)
Thanks
Daniel
As soon as the document is changed (either on disk, or via the |
Sounds all good! Thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see my last review)
| )]; | ||
| if (goToLink && goToLink.target.$segments) { | ||
| const name = this.nameProvider.getNameProperty(goToLink.target); | ||
| if (name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use the whole composite node as target in case a name is not available?
| // Whenever the user reopens the document, the CST will be rebuilt | ||
| discardCst(document.parseResult.value); | ||
| } | ||
| // Discard the diagnostics for the closed document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make this configurable (without method overriding)? It could be simply a public property on the service class to turn this behavior on or off?
I imagine there are users/language implementors who would like to keep all problem markers even for closed documents.
|
|
||
| async validateDocument(document: LangiumDocument, options: ValidationOptions = {}, cancelToken = CancellationToken.None): Promise<Diagnostic[]> { | ||
| const parseResult = document.parseResult; | ||
| if (!parseResult.value.$cstNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the early-exit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the validator requires the CST to identify where to place the validation markers. If I want to place a diagnostic on a keyword, the validator doesn't know where to place it, because all information about that keyword was lost when we discarded the CST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the lexer and parser errors don't need the CST, so we should at least report those.
| readonly locale: string; | ||
| } | ||
|
|
||
| export interface Environment extends EnvironmentInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this as a separate service? AFAICT the information is only used by the WorkspaceManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could benefit other services as well and is well suited as a stand-alone service.
|
@spoenemann I agree, this is something for a v4.0 of Langium. This PR isn't finished anyway - the formatter needs a refactoring as the new way of doing comments in the CST broke the formatter and I still haven't gotten to writing any tests for this. |
5104b23 to
60b5a57
Compare
|
I fixed a few issues with this PR and everything should now work as expected. Ready for review :) |
60b5a57 to
dbb2956
Compare
dbb2956 to
6d91794
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change improves the memory usage for large workspaces. But it actually worsens the situation when you open a large file because the parser attaches both $cstNode and $segments to the AST.
Could we change that so the parser attaches either $cstNode or $segments, but not both? Then the discardCst function can take care of gathering the necessary information from the CST to create $segments. Of course that's more effort then, but I think it's acceptable given that it's done only when a document is closed by the user, so it can happen in the background without the user noticing it.
The LSP services would have to support both ways to obtain range information, but that might be extracted into utility functions to keep it simple.
| private lexerResult?: LexerResult; | ||
| private stack: any[] = []; | ||
| private assignmentMap = new Map<AbstractElement, AssignmentElement | undefined>(); | ||
| private currentMode: CstParserMode = CstParserMode.Retain; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: name this currentCstMode to be more specific
| return obj; | ||
| obj.$cstNode = cstNode; | ||
| delete obj.$segments.comment; | ||
| obj.$segments.comment = this.commentProvider.getComment(obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete if it's immediately assigned?
| if (isNamed(node)) { | ||
| return node.name; | ||
| } | ||
| return undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return node[this.getNameProperty(node)]?
| const nameProperty = this.nameProvider.getNameProperty(node); | ||
| const nameSegment: DocumentSegment | undefined = nameProperty | ||
| ? node.$segments?.properties.get(nameProperty)[0] | ||
| : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description providers are rather central services. I suggest to keep the CST-based segment resolution as fallback.
|
Converted to a draft PR again. We will not go ahead with this change for 4.0, but reserve the option to include it in a later release. |
This (surprisingly small) change enables us to discard the CST on closed documents (in LSP use cases) to reduce our memory footprint. I've seen several crashes of Langium based language servers on large (> 100mb) workspaces, and this change should help alleviate those issues.
This change includes:
Environmentservice that identifies whether we're currently running as a language server. Running in a CLI environment usually requires full access to the CST.CstParserModethat allows to discard parsed CSTs. The default isRetain, butDiscardwill be used when we encounter a closed document while running as a language server. This also goes in tandem with the newdiscardCstfunction that removes CSTs from ASTs that were parsed inRetainmode. Useful for documents that have been closed.$segmentsproperty has been added for caching of CST data. In particular, it contains the full range of any AST node and the ranges for all properties. Ranges for keywords are not stored (yet).$segmentsproperty.Specific tests for the discard behavior will follow soon. This PR just shows the feasibility and backwards compatibility of the changes at hand so far.