Skip to content

Default string containing unicode characters are not handled properly #855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TonyWelte opened this issue Mar 16, 2025 · 1 comment · May be fixed by #862
Open

Default string containing unicode characters are not handled properly #855

TonyWelte opened this issue Mar 16, 2025 · 1 comment · May be fixed by #862
Assignees
Labels
bug Something isn't working

Comments

@TonyWelte
Copy link

For the WString something is wrong in the generated __description.c file. Relevant part:

// Define type names, field names, and default values
static char test_msgs__msg__WStrings__FIELD_NAME__wstring_value[] = "wstring_value";
static char test_msgs__msg__WStrings__FIELD_NAME__wstring_value_default1[] = "wstring_value_default1";
static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default1[] = "Hello world!";
static char test_msgs__msg__WStrings__FIELD_NAME__wstring_value_default2[] = "wstring_value_default2";
static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2[] = "Hell\\xc3\\xb6 w\\xc3\\xb6rld!";
static char test_msgs__msg__WStrings__FIELD_NAME__wstring_value_default3[] = "wstring_value_default3";
static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3[] = "\\xe3\\x83\\x8f\\xe3\\x83\\xad\\xe3\\x83\\xbc\\xe3\\x83\\xaf\\xe3\\x83\\xbc\\xe3\\x83\\xab\\xe3\\x83\\x89";
static char test_msgs__msg__WStrings__FIELD_NAME__array_of_wstrings[] = "array_of_wstrings";
static char test_msgs__msg__WStrings__FIELD_NAME__bounded_sequence_of_wstrings[] = "bounded_sequence_of_wstrings";
static char test_msgs__msg__WStrings__FIELD_NAME__unbounded_sequence_of_wstrings[] = "unbounded_sequence_of_wstrings";

static rosidl_runtime_c__type_description__Field test_msgs__msg__WStrings__FIELDS[] = {
  {
    {test_msgs__msg__WStrings__FIELD_NAME__wstring_value, 13, 13},
    {
      rosidl_runtime_c__type_description__FieldType__FIELD_TYPE_WSTRING,
      0,
      0,
      {NULL, 0, 0},
    },
    {NULL, 0, 0},
  },
  {
    {test_msgs__msg__WStrings__FIELD_NAME__wstring_value_default1, 22, 22},
    {
      rosidl_runtime_c__type_description__FieldType__FIELD_TYPE_WSTRING,
      0,
      0,
      {NULL, 0, 0},
    },
    {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default1, 12, 12},
  },
  {
    {test_msgs__msg__WStrings__FIELD_NAME__wstring_value_default2, 22, 22},
    {
      rosidl_runtime_c__type_description__FieldType__FIELD_TYPE_WSTRING,
      0,
      0,
      {NULL, 0, 0},
    },
    {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2, 12, 12},
  },
  {
    {test_msgs__msg__WStrings__FIELD_NAME__wstring_value_default3, 22, 22},
    {
      rosidl_runtime_c__type_description__FieldType__FIELD_TYPE_WSTRING,
      0,
      0,
      {NULL, 0, 0},
    },
    {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3, 7, 7},
  },

both test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2 and test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3 are registered in the FIELDS with a size and capacity much smaller that the actual size of their array. This is due to len() not returning the number of bytes but the number of codepoints.

Also should the \ in \x really be escaped in the constant definition ? If it's escaped \\xc3\\xb6 is just the string "\xc3\xb6" and not "ö".

I can create a PR with a fix but I'd like to confirm my suspicions

@fujitatomoya fujitatomoya self-assigned this Mar 27, 2025
@fujitatomoya
Copy link
Contributor

@TonyWelte thanks for creating issue.

both test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2 and test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3 are registered in the FIELDS with a size and capacity much smaller that the actual size of their array.

obviously this is wrong as you pointed out.

If it's escaped \xc3\xb6 is just the string "\xc3\xb6" and not "ö".

i think you are right about this. it does not make sense to me either.

I can create a PR with a fix but I'd like to confirm my suspicions

thank you very much, i am happy to review the PR 👍 much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants