Skip to content

Conversation

@dankessler
Copy link

As discussed in #1163 and #1142, there are some issues with R sessions invoked via ess-remote as discussed in the manual (i.e., launch a shell within emacs using M-x shell, ssh to a server, run R, and then do M-x ess-remote to get things setup).

The most pressing issue is that some literal control codes in files that are injected during configuration can get intercepted by the shell which screws up subsequent parsing and can have weird consequences. A simple fix for this is to use octal codes rather than literals in the associated R files, and this is all that 2ce1846 does. However, that alone is not sufficient to get the associated functionality working correctly, but if the rest of this PR requires further discussion and a quick fix is desired, I'm happy to submit a PR that includes* only this small commit.

The subsequent commits in this PR are aimed at improving MPI support for R sessions started with ess-remote as discussed above. I enumerate the changes/commits below, since in general they are all severable.

  • In d944300, I switch to different control codes and in particular use those associated with separators to reduce the chances of a shell ascribing them special meaning. Note: I use literals in the elisp code so in Github it renders like an empty string, but it's easier to see in emacs.

  • In a545141, change the message handler to override the read-only property which prevents it from cleaning up after itself when it runs in a *shell* buffer.

  • In f5f54f2, I (conditionally) invoke ess-tracebug when setting things up in a manner analogous to that done by R-initialize-on-start (which ess-remote doesn't call for what I assume are good reasons unknown to me). This is necessary to set off a chain of activity which ultimately results in ess-mpi-handle-message getting run automatically in R sessions started with M-x ess-remote in the same way that it does in R sessions started with M-x R.

This avoids a bug when the file is "injected" where the control characters are
potentially intercepted and consumed by the shell on their way to R.

This should fix the original bug reported in emacs-ess#1163, although it doesn't fully
address the issues with MPI support for sessions initiated by
`ess-remote` (which require some additional work as outlined [here](emacs-ess#1163 (comment))
Literal ESC is tricky because it can be consumed by the shell or otherwise
misinterpreted. Use FS to mark the beginning, GS to mark the end, and RS (which
was previously used) to mark the end of the header. Hopefully the separator
control codes are unlikely to cause conflicts.

Also simplify the delimiters to use *solely* the control code and not rely on
other adjacent ASCII characters.

Make corresponding updates to elisp variables
ess-mpi-message-{start,end}-delimiter and update the docstring for
`ess-mpi-handle-messages` (which curiously already used GS as the end
separator).
in `*shell*` buffers, the MPI message we are trying to extract may likely have
the special `read-only` text property, which prevents us from deleting it after
we handle it.

Overcome this by wrapping the deletion command in a let form that binds
`inhibit-read-only` to `t`, as suggested by the elisp manual's "Special
Properties" page
This is necessary to configure the MPI handler to run regularly
@vspinu
Copy link
Member

vspinu commented Feb 1, 2022

Thanks! Looks good. But why these characters and in that order? Have you tried the ones proposed by Lionel 001 - start of header, 002 - start of text, 003 - end of text? It would semantically map to the MPI protocol.

Though the GS, RS and US codes were designed to send tabular data https://stackoverflow.com/a/18782271/453735

How about this protocol GS RS head US payload RS GS? This way we could add more records in the future if needed.

Could you also better document the protocol and include the links to relevant info?

https://stackoverflow.com/a/18782271/453735
https://www.sciencebuddies.org/science-fair-projects/references/ascii-table

@dankessler
Copy link
Author

Have you tried the ones proposed by Lionel 001 - start of header, 002 - start of text, 003 - end of text? It would semantically map to the MPI protocol.

I hadn't tried because I was nervous about the shell being confused by \003 specifically, since it's what is sent by C-c to interrupt a process. However, it was lazy of me to not test, so I went ahead and tried it and it seems to work fine (tested with M-x R and M-x shell using both bash and zsh on a MacBook Pro).

So, I propose we go with the protocol suggested by @lionel- , specifically SOH head STX payload ETX

On the other hand, we could go further down the road proposed by @vspinu and consider the more extensible protocol of GS RS head US payload RS GS, although I don't know if there's value in being able to pack multiple messages into a single group that doesn't have its own header as opposed to just sending them atomically (although perhaps the overhead is different if there are many, many messages to send). My hesitation with this protocol (and the reason I had done something semantically weird to do something like ETX without actually using it) was concern about using the same delimiter for both start and end and a fear that the handler might get confused and bad things could happen if it thinks an end is a start and vice versa, but maybe that anxiety is misplaced and/or can be addressed.

In my opinion, the SOH head STX payload ETX approach seems clearer (and can of course be revisited down the road if MPI starts seeing a lot of use), but I'm happy to do whatever is preferred.

Could you also better document the protocol and include the links to relevant info?
https://stackoverflow.com/a/18782271/453735 https://www.sciencebuddies.org/science-fair-projects/references/ascii-table

Sure; once we settle on delimiters and a protocol I can do that. Would the docstring for ess-mpi-handle-messages be a good place to put this documentation, or are you thinking something more formal like the manual?

Also, while I'm at it, I'm planning to use octal codes in the elisp too where I can, just so that people looking at the code on e.g., GitHub are less confused by the non-printing characters.

@lionel-
Copy link
Member

lionel- commented Feb 1, 2022

Sure; once we settle on delimiters and a protocol I can do that. Would the docstring for ess-mpi-handle-messages be a good place to put this documentation, or are you thinking something more formal like the manual?

Just a note that I think all of this should be treated as internal for now, so the manual would not be a good place. Probably comments would be fine for this doc. (For the same reason I'd prefer things like the mpi regex variables to be double-slash prefixed.)

@dankessler
Copy link
Author

(For the same reason I'd prefer things like the mpi regex variables to be double-slash prefixed.)

Do you mean just for display purposes, or would you like the regex to actually include a literal slash? I.e., do you mean like this?

(defvar ess-mpi-message-start-delimiter "\001")
(defvar ess-mpi-message-field-separator "\002")
(defvar ess-mpi-message-end-delimiter "\003")

or like this?

(defvar ess-mpi-message-start-delimiter "\\\001")
(defvar ess-mpi-message-field-separator "\\\002")
(defvar ess-mpi-message-end-delimiter "\\\003")

@lionel-
Copy link
Member

lionel- commented Feb 1, 2022

Sorry, I meant double-dashed, e.g. ess--mpi-message-start-delimiter.

@dankessler
Copy link
Author

Ah, that makes more sense and I can make that change while I'm touching this code anyway (I presume the convention is that the ess-- prefix indicates things that are "internal").

As suggested in discussion of emacs-ess#1182, adopt the following protocol for MPI
`SOH header STX payload ETX`

also use octal codes in elisp for better readability
As requested in emacs-ess#1182, rename any variables/functions that I've touched to begin
with `ess--mpi` in order to follow the convention for "internal" features
remove link to ESC sequence documentation (since we no longer use literal ESC
for delimiters) and replace with link to control codes on wikipedia (slightly
redundant with existing link but with more detail).

Update docstring for ess--mpi-handle-messages to
1. Use octal's for literals for better readability on github
2. add a note calling attention to the fact that the message contains literal
ASCII control codes which hopefully gives something search-able if a user is
confused by the seemingly strange characters printed in the docstring
@dankessler
Copy link
Author

dankessler commented Feb 2, 2022

I've pushed some new commits that

  1. switch to the SOH header STX payload ETX protocol
  2. adhere to the double-dash convention (at least for any variables/functions I touched; I did not comprehensively refactor)
  3. update/add documentation in elisp comments and docstring for ess--mpi-handle-messages

I think this addresses all of the outstanding issues raised in the discussion of this PR and is perhaps good to go.

Feel free to go ahead and accept if this looks good on your end, but let me know if you prefer that I squash this all into a single commit or rebase to tidy things up and force push to update the PR.

@vspinu
Copy link
Member

vspinu commented Feb 3, 2022

I am still not very sure about the protocol. In the future we might want to extend it with a protocol version number, length of the message and maybe checksum. This is why fields separators might work better.

Also I cannot find what is the actual use case for SOH, STX, ETX on terminal emulators. Here they say

"ETX End of Text Often used as a "break" character (Ctrl-C) to interrupt or terminate a program or process."

So seems dangerous. Didn't you say that comint sends \003 on C-c?

The tabular data delimiters seem to be more appropriate for the task at hand:

"
Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it.
"

as requested by @vspinu, but for now, leave the mpi delimiter variables names
double-dashed as explicitly requested by @lionel-
dankessler added a commit to dankessler/ESS that referenced this pull request Mar 21, 2022
As suggested in discussion of emacs-ess#1182, adopt the following protocol for MPI
`SOH header STX payload ETX`

also use octal codes in elisp for better readability
dankessler added a commit to dankessler/ESS that referenced this pull request Mar 21, 2022
As requested in emacs-ess#1182, rename any variables/functions that I've touched to begin
with `ess--mpi` in order to follow the convention for "internal" features
@dankessler
Copy link
Author

dankessler commented Jun 29, 2023

I just realized we never closed the loop on this. How does this protocol (which @vspinu proposed earlier) sound?

GS RS head US payload RS GS

This could later be extended to include multiple key/value pairs like

GS RS head1 US payload1 RS head2 US payload2 RS GS

So long as we stick to the convention that there's a RS before the first pair and after the last pair, then the start/end delimiters are distinct (GS RS and RS GS, respectively), which addresses my fear of the delimiters getting unbalanced.

I've updated the PR to implement this.

new protocol is GS RS head US payload RS GS

See further [discussion on github](emacs-ess#1182 (comment))
@aramirezreyes
Copy link

aramirezreyes commented Oct 11, 2023

It would be interesting to have this merged (or at least a partial pull request with the ess-tracebug loading). The loading of ess-tracebug would give a consistent experience across ess. See, for example #1265

@malcook malcook mentioned this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants