Skip to content
b0noI edited this page Dec 20, 2014 · 14 revisions

WARNING this wiki is deprecated, new wiki is here and on our [new site](http://aif.io/

Definitions List

Tokens separator

The character that is used in text for splitting tokens (the most common character that is used for this purpose is space)

Text separators

Characters that are used for logical separation of the text. Some of the characters are used for text separation into sentences, while other characters are used for sentences separation into logical parts. Example of such characters: ,.!/?:;"'-()[]

Text separators groups

All text separators divided into 2 groups: Group1 and Group2

Group1

This group contains separators that are usually used for splitting text into sentences. Most common characters in this group: .!? . Separators of this group are not always (but usually) used for separation of text into sentences, for example: "dear Mr. Max"; the dot in the example is not used for sentence separation. In this particular case the dot is used as a separator from Group2

Group2

This group contains separators that are usually used for splitting sentences into logical sub-sentences. Most common characters in this group: ;:,-"'() . Separators of this group are not always (but usually) used for separation of sentences into sub-sentences, for example: "- Hi - Hello"

Sentence

Sequence of tokens that contains semantic piece of information and surrounded by text separators. Sentence can include text separators. Usually sentence is surrounded by text separators of Group1 and includes text separators from Group2.

Word

Consists of an approximated root token and a list of tokens with similar forms

Clone this wiki locally