-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
refactor: systematize and expose the parsing options and tokenizer configuration #3423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
7ee5387
to
6dbd02f
Compare
Amusingly I just noticed that GitHub automatically put a color dot after the example notation I used, supporting the idea that it's good to enable such a syntax extension. I do think that there might be some value in our project publishing our Chroma and number-theory extensions to mathjs as add-on packages, if you have any thoughts/ideas about where/how to most effectively publish such things. (And in fact, if we were to set up such a collection of add-ons, we might well want to break out some of the existing groups, like statistics or sets, etc, as separate add-ons, to reduce the "monolithicness" of mathjs.) |
Nice Job @gwhitney! I like your |
Great, glad to hear. As to handling/controlling precedence, I would plan to do it exactly the same way it is currently done: there would be a parse.nodes parameter (say, maybe there's a better name, like parse.expressions?) for parsing analogous to parse.tokens, and the order of entries in that parameter would determine the precedence.
|
Ah, work on this paused at the moment per #3420 (comment) |
Including a test that implements a custom token type `#HEXHEX` that could be used for color constants.
0424ee0
to
90af323
Compare
OK, have rebased on current develop. My plan is to adapt the existing configuration settings per the decisions above, and then call it a day on this pr: it will have exposed the tokenizer configuration and systematized the config settings. That seems like enough for one PR; a further similar refactor for the parsing configuration can be done in a separate PR, and should wait for #3497 to land anyway because otherwise that PR would have to basically be re-done from scratch. |
Hmm, I have run across a pragmatic point with the plans here. Right now if you get the "library instance"
then the config object is read-only (e.g.
returns a ResultSet with the square root of seven. So my concern is that moving parser configuration items like parse.isAlpha to config.parse.isAlpha will suddenly make them only configurable in a create()-ed instance, which would to my mind be an undesirable breaking change (loss of functionality). I see three options:
There may be other options as well. In any case, I'd best hold off once again until @josdejong illuminates the difficulties with mutating config in the library instance and/or picks an option for how to handle this. Thanks for getting back on this. |
I think ideally it should not be possible to alter the global/default mathjs instance like It will indeed be a breaking change, so we'll need to document this and ideally throw an explanatory error when you try to use So that is your option (2). Agree? |
OK, got it; I will modify this PR to be on top of v15, and to put the parsing configuration into config and make it not modifiable in the global instance. And since this is now already breaking, and plenty big enough especially with the moves into config, I will not try to subsume any other parser refactoring into this PR. |
I thought it was wisest for the sake of evaluating #3420 to provide a taste of the refactor. So I am opening the corresponding pull request "early" and will mark it as a work in progress. So far, this is primarily just a refactor, to systematize and clarify the tokenizing code. But it does also expose the list of token scanners as a configuration property on the
parse
function, so it already enables additional functionality -- for example, adding a token type like#FF0080
for color constants (our use case at the moment, and I've put a whole demo of this facility in parse.test.js). The remaining idea is to do the same for the parse table -- in particular, the list of delimiters would come from the big table in operators.js, rather than being redundantly repeated, and the table would become decorated with the parsing functions for the corresponding level of precedence. That would easily allow extension by addition of new operators, at either existing or new levels of precedence (either by adding to one of the entries in the operators table, or by splicing in a new entry between existing ones). If you added a//
operator, to take an example I am working on, you would just need to add to the operators table, and not have to remember to add it to some list of delimiters as well, etc.So that's the idea. I will take a pause so you can have a chance to look at the PR so far and provide feedback and/or your encouragement or discouragement of continuing in this vein. I think you will find, for example, that the tokenization code is now much more transparent, since it has been factored into several much smaller scanner functions which are just run in sequence, each looking for one particular type of token.
(I guess what I would really like to do is rewrite the parser in some popular parser grammar package, but that would be beyond the level of attention I can devote, for now anyway.)
Resolves #3422.