-
Notifications
You must be signed in to change notification settings - Fork 15
v.0.0.0 Work with sentences
WARNING this wiki is deprecated, new wiki is here and on our [new site](http://aif.io/
This page describes the functions that AIF2 can perform with sentences.
There are 2 main functions that can be used in AIF2 about tokens:
- extract sentence splitters characters from tokens list;
- split tokens list into sentences list.
This function gives you a possibility to extract a sentence splitters characters list from the input tokens list. This function should be used by using interface: ISentenceSeparatorExtractor (from package: com.aif.language.sentence). To create an instance of this interface you need to select the type of ISentenceSeparatorExtractor and get an instance like this:
ISentenceSeparatorExtractor.Type.PREDEFINED.getInstance()
For the difference between separators types see the section below
This extractor has predefined the characters that will be used for sentence splitting. It means that this extractor type will not parse tokens in any way and will just return the predefined characters.
This extractor will parse input tokens and will extract sentence splitters from tokens. This splitter should be used when the current text has sentence splitters that are not included in Type.PREDEFINED splitter. It's very useful when your text has non-standard encoding, or symbols using strange encoding.
To split sentences you need to:
- create SentenceSplitter instance(from package: com.aif.language.sentence);
- call split method.
You can initiate it in 2 ways:
- setting Sentence Separators Extractor Type;
- using default Sentence Separators Extractor Type.
ISentenceSeparatorExtractor sentenceSeparatorExtractor = ...
ISplitter<List<String>, List<String>> sentenceSplitter = new SentenceSplitter(sentenceSeparatorExtractor);
This will create sentenceSplitter that will use sentenceSeparatorExtractor to split sentences. You can also create SentenceSplitter with default ISentenceSeparatorExtractor like this:
ISplitter<List<String>, List<String>> sentenceSplitter = new SentenceSplitter();
By default it will use this ISentenceSeparatorExtractor.Type.STAT.getInstance() sentences separator extractor.
After you have SentenceSplitter instance, you can split tokens by calling split method like this:
ISplitter<List<String>, List<String>> sentenceSplitter = ...
List<String> tokens = ...
List<List<String>> sentences = sentenceSplitter.split(tokens);
example of real usage can be found here
Information about algorithm can be found here