Skip to content

String as Features #14

@jan-gerling

Description

@jan-gerling

We have a small collection of features within our training data that are strings, see below for the features.
How will we handle these features? For now, I disabled them, see here.

These features might be interesting for some refactoring types, especially the "Rename" refactorings types, thus it would be good if we could use them.

Most ways to handle strings as features for machine-learning are not applicable in our case, because:

  1. One-Hot-Encoding/ Categorical Variables: descriptors are difficult to map into categories that still contain the properties we are interested in
  2. Convert to number: naively converting a name to a byte array, will probably not yield to good results, as the entire data is very messy and the length are very different.
  3. Extract properties: we could extract properties from the names that we deem potentially relevant, e.g. number of characters, number of special characters, etc.

Features

Method Level:

  • fullMethodName
  • shortMethodName

Field Level:

  • fieldName

Variable Level:

  • variableName

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or requestquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions