Skip to content

Update new language doc #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions documents/how_to_add_new_language.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ NOTE: Take a look at [PR #40](https://github.com/unicode-org/inflection/pull/40)
In general, to bootstrap your progress look for grammatically similar language that's already supported, e.g. if you are adding Serbian look for existing Russian implementation.
This will help you find most of the files you need to add/change and will speed up implementation of the rules and lexicons.

Before you add new language support, go to the README.md in the inflection subfolder (inflection/inflection/README.md), build the project, and make sure all the tests run on your computer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!


## Mark your language as supported
* UPDATE: inflection/src/inflection/util/LocaleUtils.hpp
* UPDATE: inflection/src/inflection/util/LocaleUtils.cpp
Expand All @@ -29,13 +31,13 @@ TODO: We need to expand what each of these do.
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.hpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.cpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this documentation. This is a good improvement.

* UPDATE: inflection/src/inflection/grammar/synthesis/GrammarSynthesizerFactory.cpp
* UPDATE: inflection/src/inflection/grammar/synthesis/fwd.hpp

## Add language specific properties for lists, quantities and related topics
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.hpp
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.hpp
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.cpp
* UPDATE: inflection/src/inflection/dialog/language/fwd.hpp

## Define and create lexion
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
dictionary_da.lst filter=lfs diff=lfs merge=lfs -text
dictionary_en.lst filter=lfs diff=lfs merge=lfs -text
dictionary_es.lst filter=lfs diff=lfs merge=lfs -text
dictionary_ml.lst filter=lfs diff=lfs merge=lfs -text
inflectional_da.xml filter=lfs diff=lfs merge=lfs -text
inflectional_en.xml filter=lfs diff=lfs merge=lfs -text
inflectional_es.xml filter=lfs diff=lfs merge=lfs -text
inflectional_ml.xml filter=lfs diff=lfs merge=lfs -text
inflectional_sv.xml filter=lfs diff=lfs merge=lfs -text
dictionary_sv.lst filter=lfs diff=lfs merge=lfs -text
748,739 changes: 748,739 additions & 0 deletions inflection/resources/org/unicode/inflection/dictionary/dictionary_ml.lst

Large diffs are not rendered by default.

7,714 changes: 7,714 additions & 0 deletions inflection/resources/org/unicode/inflection/dictionary/inflectional_ml.xml

Large diffs are not rendered by default.

83 changes: 83 additions & 0 deletions inflection/resources/org/unicode/inflection/features/grammar.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1624,6 +1624,89 @@
</category>
</grammar>
</language>
<language id="ml">
<grammar>
<category name="case">
<grammeme name="nominative"/> <!-- no explicit marker; subject form -->
<grammeme name="accusative"/> <!-- -യെ, -ായെ, marks direct object -->
<grammeme name="genitive"/> <!-- -ന്റെ, -യുടെ (possessive) -->
<grammeme name="dative"/> <!-- -ക്ക്, -ന് (to/for) -->
<grammeme name="instrumental"/> <!-- -ആല് (by means of) -->
<grammeme name="locative"/> <!-- -യില് (in/at) -->
<grammeme name="ablative"/> <!-- -യില് നിന്നു് (from) -->
<grammeme name="vocative"/> <!-- used in direct address -->
</category>
<category name="number">
<grammeme name="singular"/>
<grammeme name="plural"/>
</category>
<category name="person">
<restrictions>
<restriction name="pos" value="pronoun"/>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="first"/>
<grammeme name="second"/>
<grammeme name="third"/>
</category>
<category name="gender">
<restrictions>
<restriction name="pos" value="pronoun"/>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="masculine"/>
<grammeme name="feminine"/>
<grammeme name="neuter"/> <!-- e.g. for objects or animals -->
</category>
<category name="tense">
<restrictions>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="past"/>
<grammeme name="present"/>
<grammeme name="future"/>
</category>
<category name="mood">
<restrictions>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="indicative"/>
<grammeme name="imperative"/>
<grammeme name="subjunctive"/>
</category>
<category name="voice">
<restrictions>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="active"/>
<grammeme name="passive"/>
</category>
<category name="formality">
<restrictions>
<restriction name="pos" value="pronoun"/>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="intimate"/>
<grammeme name="casual"/>
<grammeme name="formal"/>
<grammeme name="honorific"/>
</category>
<category name="aspect">
<restrictions>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="perfective"/>
<grammeme name="imperfective"/>
</category>
<category name="negation">
<restrictions>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="affirmative"/>
<grammeme name="negative"/>
</category>
</grammar>
</language>
<language id="ms">
<grammar>
<category name="clusivity">
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
എനിക്ക്,first,singular,dative
ഞാൻ,first,singular,nominative
എന്നെ,first,singular,accusative
എൻ്റെ,first,singular,genitive,dependency=dependent
എൻ്റെത്,first,singular,genitive,dependency=independent
നമുക്ക്,first,plural,dative
ഞങ്ങൾ,first,plural,nominative
ഞങ്ങളെ,first,plural,accusative
ഞങ്ങളുടെ,first,plural,genitive,dependency=dependent
ഞങ്ങളുടേതു്,first,plural,genitive,dependency=independent
നമ്മുടെ,first,plural,genitive,dependency=dependent
നമ്മുടേതു്,first,plural,genitive,dependency=independent
നിനക്ക്,second,singular,dative,dependency=nonhonorific
നീ,second,singular,nominative,dependency=nonhonorific
നിനെ,second,singular,accusative,dependency=nonhonorific
നിന്റെ,second,singular,genitive,dependency=dependent,dependency=nonhonorific
നിന്റേതു്,second,singular,genitive,dependency=independent,dependency=nonhonorific
Comment on lines +13 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. Typically gender, number, or animacy would be used for dependency. The dependency is typically being used for the word being used in combination to the pronoun. For example, if I were to say "mio" or "mia" in a language, and the gender depended on the gender of the object being possessed instead of the gender of the person being referenced, then I'd use dependency.

നിങ്ങൾക്ക്,second,plural,dative,dependency=honorific
നിങ്ങൾ,second,plural,nominative,dependency=honorific
നിങ്ങളെ,second,plural,accusative,dependency=honorific
നിങ്ങളുടെ,second,plural,genitive,dependency=dependent,dependency=honorific
നിങ്ങളുടേതു്,second,plural,genitive,dependency=independent,dependency=honorific
അവൻ,third,singular,nominative,masculine
അവനെ,third,singular,accusative,masculine
അവൻ്റെ,third,singular,genitive,dependency=dependent,masculine
അവൻ്റെത്,third,singular,genitive,dependency=independent,masculine
അവൾ,third,singular,nominative,feminine
അവളെ,third,singular,accusative,feminine
അവളുടെ,third,singular,genitive,dependency=dependent,feminine
അവളുടേതു്,third,singular,genitive,dependency=independent,feminine
അത്,third,singular,nominative,neuter
അതിനെ,third,singular,accusative,neuter
അതിന്റെ,third,singular,genitive,dependency=dependent,neuter
അതിന്റേതു്,third,singular,genitive,dependency=independent,neuter
അവർ,third,plural,nominative
അവരെ,third,plural,accusative
അവരുടെ,third,plural,genitive,dependency=dependent
അവരുടേതു്,third,plural,genitive,dependency=independent

Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ locale.group.it=it_IT,it_CH
locale.group.ja=ja_JP
locale.group.ko=ko_KR
locale.group.ms=ms_MY
locale.group.ml=ml_IN
locale.group.nb=nb_NO
locale.group.nl=nl_NL,nl_BE
locale.group.pt=pt_BR,pt_PT
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#
# Copyright 2025 Unicode Incorporated and others. All rights reserved.
#
tokenizer.implementation.class=DefaultTokenizer

Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/*
* Copyright 2025 Apple Inc. All rights reserved.
*/
#include <inflection/dialog/language/MlCommonConceptFactory.hpp>

namespace inflection::dialog::language {

MlCommonConceptFactory::MlCommonConceptFactory(const ::inflection::util::ULocale& language)
: super(language)
{
}

MlCommonConceptFactory::~MlCommonConceptFactory()
{
}
Comment on lines +2 to +15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of the new files, please make sure that the copyright is not Apple, and it's not Unicode.

I'm assuming that this class is trying to emulate the English behavior, where a quantity changes by only it being singular or plural for the noun in a quantity. For example, 1 man, 2 men, 1 woman, 2 women, and so forth.

Are such quantities affected by grammatical case? If so, then this class likely needs a little more customization for the quantify method. For Slavic languages, the rules can get fairly complicated, and Malayalam seems to have more grammatical cases than Russian. So I'm wondering how this would work. I also see that the number pronunciation doesn't vary like many other European languages. So Malayalam language may be simpler to support.


} // namespace inflection::dialog::language
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/*
* Copyright 2025 Apple Inc. All rights reserved.
*/
#pragma once

#include <inflection/dialog/language/fwd.hpp>
#include <inflection/dialog/CommonConceptFactoryImpl.hpp>

class inflection::dialog::language::MlCommonConceptFactory
: public CommonConceptFactoryImpl
{
public:
typedef CommonConceptFactoryImpl super;
public:
explicit MlCommonConceptFactory(const ::inflection::util::ULocale& language);
~MlCommonConceptFactory() override;
};
3 changes: 2 additions & 1 deletion inflection/src/inflection/dialog/language/fwd.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2017-2024 Apple Inc. All rights reserved.
* Copyright 2017-2025 Apple Inc. All rights reserved.
*/
// Forward declarations for inflection.dialog.language
#pragma once
Expand Down Expand Up @@ -28,6 +28,7 @@ namespace inflection
class JaCommonConceptFactory;
class KoCommonConceptFactory;
class KoCommonConceptFactory_KoAndList;
class MlCommonConceptFactory;
class MsCommonConceptFactory;
class NbCommonConceptFactory;
class NlCommonConceptFactory;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2017-2024 Apple Inc. All rights reserved.
* Copyright 2017-2025 Apple Inc. All rights reserved.
*/
#include <inflection/grammar/synthesis/GrammarSynthesizerFactory.hpp>

Expand All @@ -13,6 +13,7 @@
#include <inflection/grammar/synthesis/HiGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/ItGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/KoGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/MlGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/NbGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/NlGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/PtGrammarSynthesizer.hpp>
Expand Down Expand Up @@ -41,6 +42,7 @@ static const ::std::map<::inflection::util::ULocale, addSemanticFeatures>& GRAMM
{::inflection::util::LocaleUtils::HINDI(), &HiGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::ITALIAN(), &ItGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::KOREAN(), &KoGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::MALAYALAM(), &MlGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::NORWEGIAN(), &NbGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::DUTCH(), &NlGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::PORTUGUESE(), &PtGrammarSynthesizer::addSemanticFeatures},
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Copyright 2025 Apple Inc. All rights reserved.
*/
#include <inflection/grammar/synthesis/MlGrammarSynthesizer.hpp>

#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/grammar/synthesis/MlGrammarSynthesizer_CountLookupFunction.hpp>
#include <inflection/grammar/synthesis/MlGrammarSynthesizer_GenderLookupFunction.hpp>
#include <inflection/grammar/synthesis/MlGrammarSynthesizer_CaseLookupFunction.hpp>
#include <inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.hpp>
#include <inflection/grammar/synthesis/GrammemeConstants.hpp>

namespace inflection::grammar::synthesis {

void MlGrammarSynthesizer::addSemanticFeatures(::inflection::dialog::SemanticFeatureModel& featureModel)
{
featureModel.putDefaultFeatureFunctionByName(GrammemeConstants::NUMBER, new MlGrammarSynthesizer_CountLookupFunction());
featureModel.putDefaultFeatureFunctionByName(GrammemeConstants::GENDER, new MlGrammarSynthesizer_GenderLookupFunction());
featureModel.putDefaultFeatureFunctionByName(GrammemeConstants::CASE, new MlGrammarSynthesizer_CaseLookupFunction());

featureModel.setDefaultDisplayFunction(new MlGrammarSynthesizer_MlDisplayFunction(featureModel));
}

} // namespace inflection::grammar::synthesis

Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/*
* Copyright 2025 Apple Inc. All rights reserved.
*/
#pragma once

#include <inflection/dialog/fwd.hpp>
#include <inflection/grammar/synthesis/fwd.hpp>
#include <string>

class inflection::grammar::synthesis::MlGrammarSynthesizer final
{
public:
static void addSemanticFeatures(::inflection::dialog::SemanticFeatureModel& featureModel);
private:
MlGrammarSynthesizer() = delete;
};

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Copyright 2025 Apple Inc. All rights reserved.
*/
#include <inflection/grammar/synthesis/MlGrammarSynthesizer_CaseLookupFunction.hpp>

#include <inflection/grammar/synthesis/GrammemeConstants.hpp>
#include <inflection/dialog/SemanticFeature.hpp>
#include <inflection/dialog/DisplayValue.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <inflection/util/LocaleUtils.hpp>
#include <inflection/util/StringViewUtils.hpp>

namespace inflection::grammar::synthesis {

MlGrammarSynthesizer_CaseLookupFunction::MlGrammarSynthesizer_CaseLookupFunction()
: super()
{
// No file needed
}

inflection::dialog::SpeakableString* MlGrammarSynthesizer_CaseLookupFunction::getFeatureValue(const ::inflection::dialog::DisplayValue& displayValue, const ::std::map<::inflection::dialog::SemanticFeature, ::std::u16string>& /*constraints*/) const
{
std::u16string displayString;
::inflection::util::StringViewUtils::lowercase(&displayString, displayValue.getDisplayString(), ::inflection::util::LocaleUtils::MALAYALAM());

if (displayString.length() >= 3) {
// Genitive-indicative suffixes in Malayalam
if (displayString.ends_with(u"ഉടെ") || // uṭe
displayString.ends_with(u"യുടെ") || // yude (my, your, his, her...)
displayString.ends_with(u"ന്റെ") || // ente (mine), avante, etc.
displayString.ends_with(u"ആയുടെ")) // āyuṭe (fem. 3rd person possessive)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic correct? Are these all suffixes? Should the displayString be longer than the string being matched? If so, perhaps enumerating these strings in a loop and doing a length comparison would be more accurate.

{
return new ::inflection::dialog::SpeakableString(GrammemeConstants::CASE_GENITIVE());
}
}
return nullptr;
}

} // namespace inflection::grammar::synthesis

Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Copyright 2025 Apple Inc. All rights reserved.
*/
#pragma once

#include <inflection/dialog/fwd.hpp>
#include <inflection/grammar/synthesis/fwd.hpp>
#include <inflection/dialog/DefaultFeatureFunction.hpp>
#include <set>
#include <string>

class inflection::grammar::synthesis::MlGrammarSynthesizer_CaseLookupFunction
: public ::inflection::dialog::DefaultFeatureFunction
{
public:
typedef ::inflection::dialog::DefaultFeatureFunction super;

public:
::inflection::dialog::SpeakableString* getFeatureValue(const ::inflection::dialog::DisplayValue& displayValue, const ::std::map<::inflection::dialog::SemanticFeature, ::std::u16string>& constraints) const override;

MlGrammarSynthesizer_CaseLookupFunction();
MlGrammarSynthesizer_CaseLookupFunction(const MlGrammarSynthesizer_CaseLookupFunction&) = delete;
MlGrammarSynthesizer_CaseLookupFunction& operator=(const MlGrammarSynthesizer_CaseLookupFunction&) = delete;
};

Loading
Loading