Skip to content

Fix default wstring length #862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from

Conversation

TonyWelte
Copy link

@TonyWelte TonyWelte commented Apr 27, 2025

This PR fixes strings containing unicode characters not being escaped properly and their size being inaccurate. For instance on WString it results in the following change

38c38
< static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2[] = "Hell\\xc3\\xb6 w\\xc3\\xb6rld!";
---
> static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2[] = "Hell\xc3\xb6 w\xc3\xb6rld!";
40c40
< static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3[] = "\\xe3\\x83\\x8f\\xe3\\x83\\xad\\xe3\\x83\\xbc\\xe3\\x83\\xaf\\xe3\\x83\\xbc\\xe3\\x83\\xab\\xe3\\x83\\x89";
---
> static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3[] = "\xe3\x83\x8f\xe3\x83\xad\xe3\x83\xbc\xe3\x83\xaf\xe3\x83\xbc\xe3\x83\xab\xe3\x83\x89";
74c74
<     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2, 12, 12},
---
>     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2, 14, 14},
84c84
<     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3, 7, 7},
---
>     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3, 21, 21},
140,141c140,141
<   "wstring wstring_value_default2 \"Hell\\xc3\\xb6 w\\xc3\\xb6rld!\"\n"
<   "wstring wstring_value_default3 \"\\xe3\\x83\\x8f\\xe3\\x83\\xad\\xe3\\x83\\xbc\\xe3\\x83\\xaf\\xe3\\x83\\xbc\\xe3\\x83\\xab\\xe3\\x83\\x89\"\n"
---
>   "wstring wstring_value_default2 \"Hell\xc3\xb6 w\xc3\xb6rld!\"\n"
>   "wstring wstring_value_default3 \"\xe3\x83\x8f\xe3\x83\xad\xe3\x83\xbc\xe3\x83\xaf\xe3\x83\xbc\xe3\x83\xab\xe3\x83\x89\"\n"
161c161
<     {toplevel_type_raw_source, 399, 399},
---
>     {toplevel_type_raw_source, 415, 415},

It also fixes " and ' being over escaped (and some other escaped characters). For instance on https://github.com/Tonywelte/test_interface_files/blob/rolling/msg/Strings.msg (see ros2/test_interface_files#25)

46c46
< static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default6[] = "Hello\\\\world!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default6[] = "Hello\\world!";
48c48
< static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default7[] = "Hello\\tworld!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default7[] = "Hello  world!";
61c61
< static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default6[] = "Hello\\\\world!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default6[] = "Hello\\world!";
63c63
< static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default7[] = "Hello\\tworld!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default7[] = "Hello  world!";
250,255c250,255
<   "string string_value_default2 \"Hello\\'world!\"\n"
<   "string string_value_default3 \\'Hello\"world!\\'\n"
<   "string string_value_default4 'Hello\\\\'world!'\n"
<   "string string_value_default5 \"Hello\\\\\"world!\"\n"
<   "string string_value_default6 \"Hello\\\\\\\\world!\"\n"
<   "string string_value_default7 \"Hello\\\\tworld!\"\n"
---
>   "string string_value_default2 \"Hello'world!\"\n"
>   "string string_value_default3 'Hello\"world!'\n"
>   "string string_value_default4 'Hello\\'world!'\n"
>   "string string_value_default5 \"Hello\\\"world!\"\n"
>   "string string_value_default6 \"Hello\\\\world!\"\n"
>   "string string_value_default7 \"Hello\\tworld!\"\n"
260,265c260,265
<   "string<=22 bounded_string_value_default2 \"Hello\\'world!\"\n"
<   "string<=22 bounded_string_value_default3 \\'Hello\"world!\\'\n"
<   "string<=22 bounded_string_value_default4 'Hello\\\\'world!'\n"
<   "string<=22 bounded_string_value_default5 \"Hello\\\\\"world!\"\n"
<   "string<=22 bounded_string_value_default6 \"Hello\\\\\\\\world!\"\n"
<   "string<=22 bounded_string_value_default7 \"Hello\\\\tworld!\"\n"
---
>   "string<=22 bounded_string_value_default2 \"Hello'world!\"\n"
>   "string<=22 bounded_string_value_default3 'Hello\"world!'\n"
>   "string<=22 bounded_string_value_default4 'Hello\\'world!'\n"
>   "string<=22 bounded_string_value_default5 \"Hello\\\"world!\"\n"
>   "string<=22 bounded_string_value_default6 \"Hello\\\\world!\"\n"
>   "string<=22 bounded_string_value_default7 \"Hello\\tworld!\"\n"

Additional notes

  • While working on this fix I noticed that rosidl_runtime_c__String states that the capacity should include the null byte but the size shouldn't. None of the string defined by rosidl generator do this. I haven't fixed it in this PR since I'm not sure what needs to be changed. It should be as simple as adding +1 in static_seq but for toplevel_type_raw_source it should be -1. I'm not sure what's the difference.
  • A think it should be clarified what escaped characters are supported in default strings. \' and \" are mentioned in https://design.ros2.org/articles/legacy_interface_definition.html but other (\\ and \t) worked in rosidl_generator_cpp but rosidl_generator_c didn't escape them properly. \n doesn't work at all.

Fixes #855

@TonyWelte TonyWelte force-pushed the bugfix/unicode-string-length branch from 2360fe6 to 6eba9f8 Compare April 27, 2025 11:12
@TonyWelte TonyWelte force-pushed the bugfix/unicode-string-length branch from 6eba9f8 to 8c50d41 Compare May 4, 2025 14:10
@TonyWelte TonyWelte marked this pull request as ready for review May 4, 2025 14:12
TonyWelte added a commit to TonyWelte/rosidlcpp that referenced this pull request May 4, 2025
Comment on lines 235 to 240
# ASCII characters are encoded as-is
# Non-ASCII characters are encoded as UTF-8
if c < 0x80:
result += chr(c)
else:
result += '\\x%02x' % c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. comments stays where the code is implemented?

Suggested change
# ASCII characters are encoded as-is
# Non-ASCII characters are encoded as UTF-8
if c < 0x80:
result += chr(c)
else:
result += '\\x%02x' % c
if c < 0x80:
# ASCII characters are encoded as-is
result += chr(c)
else:
# Non-ASCII characters are encoded as UTF-8
result += '\\x%02x' % c

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me, just a minor nitpick.

i would like to have an another review for this PR before merge.

@fujitatomoya
Copy link
Contributor

Pulls: #862
Gist: https://gist.githubusercontent.com/fujitatomoya/88476ec104d8b6970d44034c3f4a6fac/raw/354a5a9e3a0ddbe768c322cf3f1ae478f06cbdf5/ros2.repos
BUILD args: --packages-above-and-dependencies rosidl_generator_c
TEST args: --packages-above rosidl_generator_c
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/15920

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@TonyWelte TonyWelte force-pushed the bugfix/unicode-string-length branch from 8c50d41 to 0c29c98 Compare May 6, 2025 18:33
@fujitatomoya
Copy link
Contributor

Pulls: #862
Gist: https://gist.githubusercontent.com/fujitatomoya/9157bd03771275331596df97b01c4e91/raw/354a5a9e3a0ddbe768c322cf3f1ae478f06cbdf5/ros2.repos
BUILD args: --packages-above-and-dependencies rosidl_generator_c
TEST args: --packages-above rosidl_generator_c
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/16050

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Signed-off-by: Anthony Welte <[email protected]>
@TonyWelte TonyWelte force-pushed the bugfix/unicode-string-length branch from 0c29c98 to 52876f4 Compare May 21, 2025 19:37
@TonyWelte
Copy link
Author

TonyWelte commented May 21, 2025

Looks like the RHEL build uses an older Python version (before 3.12) which doesn't like strings within fstrings. I've pushed the following modification:

   """Statically define a runtime Sequence or String type."""
   if values:
     if isinstance(values, str):
-      return f'{{{varname}, {len(values.encode('utf-8'))}, {len(values.encode('utf-8'))}}}'
+      utf8_values = values.encode('utf-8')
+      return f'{{{varname}, {len(utf8_values)}, {len(utf8_values)}}}'
     else:
       return f'{{{varname}, {len(values)}, {len(values)}}}'
   return '{NULL, 0, 0}'

@fujitatomoya
Copy link
Contributor

Pulls: #862
Gist: https://gist.githubusercontent.com/fujitatomoya/56af7eafdd1a192e3fc92c458054b5f5/raw/354a5a9e3a0ddbe768c322cf3f1ae478f06cbdf5/ros2.repos
BUILD args: --packages-above-and-dependencies rosidl_generator_c
TEST args: --packages-above rosidl_generator_c
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/16062

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default string containing unicode characters are not handled properly
2 participants