Skip to content

feat(WIP): python parser #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

feat(WIP): python parser #21

wants to merge 32 commits into from

Conversation

Hoblovski
Copy link
Collaborator

@Hoblovski Hoblovski commented May 16, 2025

What type of PR is this?

实现 python 的 parser,基于修改后的 python-lsp-server
(暂时先不合)

修改后的 python-lsp-server 在 https://github.com/Hoblovski/python-lsp-server/tree/abc
安装直接 python -m pip install -e .,然后检查 pylsp 是否在 PATH 里。

Python 的一个问题是,相当多代码没有类型标注,导致函数依赖不太明确。
例如 fn foo(x: Foobar) 明显有个依赖 foo -> Foobar,但是 def foo(x) 就不行,除非写成 def foo(x: Foobar)
一个办法是用类型推断工具自动加一些类型标注,但是效果有限而且疑似有些复杂了。

另外一个问题是效率比较低,我没有修改 collect 的逻辑,导致 parse 一个 astropy 会特别特别慢(大概几个小时都做不完)。
现在在 perf 代码调优……

最后关于实现

  • 不处理 LLM 暂时不用的 module(只有 current),正常处理 package 。

    (python 体系下的 module package 和我们是相反的,除非例外我都用的是 abcoder uniast 的说法即 module 大于 package)

  • impl header 的处理不是很 idiomatic, 会变成 class Foo {\n def foobar... }
    • 如果不是 token index 而是更灵活的就好了,因为有些 lsp 的 semtoken 相当有限

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
zh(optional):

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

Hoblovski and others added 21 commits April 30, 2025 17:45
1. Since clangd does not support semanticTokens/range method, use
  semanticTokens/full + filtering to emulate.
2. Since the concept of package and module does not apply to C/C++,
  treat the whole repo as a single package/module.
Custom pylsp is based on [python-lsp-server](https://github.com/python-lsp/python-lsp-server), and plus the following pull requests:
1. semanticTokens/full: python-lsp/python-lsp-server#645
2. typeDefinition: python-lsp/python-lsp-server#533
Maybe also
3. implementation: python-lsp/python-lsp-server#644
Parses and generates json.
Tons of ad hoc decisions.
Python LSP does not print `def` or `(` as semtokens, but we need their
position to determine which semantic tokens are in parameters or return
values. So add a manual parsing based on `sym.Text`.
…mbols

For example, a class `Foo` has a method `bar`. When trying to infer the
symbol from a location within `bar`, we follow getSymbolByToken ->
getSymbolByLocation -> filterEntitySymbols. The `getSymbolByLocation`
presents two candidates `Foo` and `bar`. We should accept the `bar`
because it is most specific.

Existing Rust implementation avoids the problem because `Foo` will be an
impl symbol, which is not an entity symbol. However in Python, `Foo` is
a class and thus has to be an entity symbol.
@Hoblovski Hoblovski force-pushed the feat/python-parser branch from 42d5d6a to 537a2ed Compare May 16, 2025 04:35
@Hoblovski
Copy link
Collaborator Author

一些输出的例子

testdata/pysimpleobj:单文件,简单的 python class
https://gist.github.com/Hoblovski/f0e676d33e94117b6462ee505fb13f8f

testdata/pythonsingle:单文件,比较复杂
https://gist.github.com/Hoblovski/0cb8572e0a5546965352ce11a7cf7126

testdata/pythonsimple:多文件多模块,外部依赖,+OOP method
https://gist.github.com/Hoblovski/f824db079d766cf4e35927a81f493943

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants