Update new language doc #126

deonajulary06 · 2025-06-10T19:48:19Z

Updated document with additional instructions and fixed errors

grhoten · 2025-06-12T05:40:33Z

inflection/resources/org/unicode/inflection/inflection/pronoun_ml.csv

+നിനക്ക്,second,singular,dative,dependency=nonhonorific
+നീ,second,singular,nominative,dependency=nonhonorific
+നിനെ,second,singular,accusative,dependency=nonhonorific
+നിന്റെ,second,singular,genitive,dependency=dependent,dependency=nonhonorific
+നിന്റേതു്,second,singular,genitive,dependency=independent,dependency=nonhonorific


This is confusing. Typically gender, number, or animacy would be used for dependency. The dependency is typically being used for the word being used in combination to the pronoun. For example, if I were to say "mio" or "mia" in a language, and the gender depended on the gender of the object being possessed instead of the gender of the person being referenced, then I'd use dependency.

grhoten · 2025-06-12T05:42:16Z

inflection/src/inflection/dialog/language/MlCommonConceptFactory.cpp

+ * Copyright 2025 Apple Inc. All rights reserved.
+ */
+#include <inflection/dialog/language/MlCommonConceptFactory.hpp>
+
+namespace inflection::dialog::language {
+
+MlCommonConceptFactory::MlCommonConceptFactory(const ::inflection::util::ULocale& language)
+    : super(language)
+{
+}
+
+MlCommonConceptFactory::~MlCommonConceptFactory()
+{
+}


In all of the new files, please make sure that the copyright is not Apple, and it's not Unicode.

I'm assuming that this class is trying to emulate the English behavior, where a quantity changes by only it being singular or plural for the noun in a quantity. For example, 1 man, 2 men, 1 woman, 2 women, and so forth.

Are such quantities affected by grammatical case? If so, then this class likely needs a little more customization for the quantify method. For Slavic languages, the rules can get fairly complicated, and Malayalam seems to have more grammatical cases than Russian. So I'm wondering how this would work. I also see that the number pronunciation doesn't vary like many other European languages. So Malayalam language may be simpler to support.

grhoten · 2025-06-12T05:53:30Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_CaseLookupFunction.cpp

+        if (displayString.ends_with(u"ഉടെ") ||   // uṭe
+            displayString.ends_with(u"യുടെ") || // yude (my, your, his, her...)
+            displayString.ends_with(u"ന്റെ") || // ente (mine), avante, etc.
+            displayString.ends_with(u"ആയുടെ"))  // āyuṭe (fem. 3rd person possessive)


Is this logic correct? Are these all suffixes? Should the displayString be longer than the string being matched? If so, perhaps enumerating these strings in a loop and doing a length comparison would be more accurate.

grhoten · 2025-06-12T05:54:59Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_CountLookupFunction.cpp

+MlGrammarSynthesizer_CountLookupFunction::MlGrammarSynthesizer_CountLookupFunction()
+    : super(::inflection::util::LocaleUtils::MALAYALAM(),
+            {GrammemeConstants::NUMBER_SINGULAR(), GrammemeConstants::NUMBER_PLURAL()},
+            {GrammemeConstants::POS_NOUN(), GrammemeConstants::POS_VERB()})


Do adjectives inflect in Malayalam?

Also it's probably better to rename this to MlGrammarSynthesizer_NumberLookupFunction. That would be more accurate. Unfortunately some of the classes use the term Count, but it's more accurate to say Number. There is a reason for this historical inaccuracy in this code that I don't need to elaborate here.

grhoten · 2025-06-12T06:01:03Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.cpp

+    , caseFeature(*npc(model.getFeature(GrammemeConstants::CASE)))
+    , definitenessFeature(*npc(model.getFeature(GrammemeConstants::DEFINITENESS)))
+    , adjectivalFeature(*npc(model.getFeature(ADJECTIVAL)))
+    , particleMap({
+        {GrammemeConstants::CASE_ACCUSATIVE(),      {MlGrammarSynthesizer::FINAL_VOWELS(), u"യെ", u"നെ"}},
+        {GrammemeConstants::CASE_DATIVE(),          {MlGrammarSynthesizer::FINAL_VOWELS(), u"ക്ക്", u"നു"}},
+        {GrammemeConstants::CASE_GENITIVE(),        {MlGrammarSynthesizer::EMPTY_SET(),     u"",    u"ന്റെ"}},
+        {GrammemeConstants::CASE_LOCATIVE(),        {MlGrammarSynthesizer::FINAL_VOWELS(), u"യിൽ", u"ൽ"}},
+        {GrammemeConstants::CASE_ABLATIVE(),        {MlGrammarSynthesizer::FINAL_VOWELS(), u"ഇൽനിന്ന്", u"ൽനിന്ന്"}},
+        {GrammemeConstants::CASE_INSTRUMENTAL(),    {MlGrammarSynthesizer::FINAL_VOWELS(), u"ഉപയോഗിച്ച്", u"കൊണ്ട്"}},
+        {GrammemeConstants::CASE_VOCATIVE(),        {MlGrammarSynthesizer::FINAL_VOWELS(), u"ആ", u"േ"}},
+        {PREDICATIVE,                               {MlGrammarSynthesizer::FINAL_VOWELS(), u"ആണ്", u"ഇാണ്"}}
+    }) {}


This is surprising. It looks like you took some inspiration from Korean for the particle map. Is Malayalam that simple and reproducible? I'd figure that it would have some way that mixes grammatical gender and grammatical number in with the case.

I'm a little surprised by the usage of DEFINITENESS, but grammar.xml doesn't define that. I'd expect that the test code would crash here during construction. I recommend reviewing this DEFINITENESS usage.

grhoten · 2025-06-12T06:02:18Z

inflection/test/resources/inflection/dialog/inflection/ml.xml

+<!-- Malayalam Singular Plural Inflection Tests -->
+<test><source number="plural">കുട്ടി</source><result>കുട്ടികൾ</result></test>
+<test><source number="singular">പുസ്തകങ്ങൾ</source><result>പുസ്തകം</result></test>
+<test><source number="plural">മരം</source><result>മരങ്ങൾ</result></test>
+<test><source number="singular">കഥകൾ</source><result>കഥ</result></test>
+
+<!-- Malayalam Gender Inflection Tests (for adjectives) -->
+<test><source gender="feminine">നല്ല</source><result exists="true">നല്ല</result></test>
+<test><source gender="masculine">നല്ല</source><result exists="true">നല്ല</result></test>


It looks like MlGrammarSynthesizer_MlDisplayFunction doesn't support these operations yet.

grhoten · 2025-06-12T06:04:53Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.hpp

+
+#include <inflection/dialog/DefaultDisplayFunction.hpp>
+#include <inflection/grammar/synthesis/fwd.hpp>
+#include <inflection/grammar/synthesis/MlGrammarSynthesizer_ParticleResolver.hpp>


The compilation fails in this pull request due to this missing header. Since it's being referenced in this class, perhaps you forgot to include it in your commits.

grhoten · 2025-06-12T06:13:27Z

documents/how_to_add_new_language.md

@@ -29,13 +31,13 @@ TODO: We need to expand what each of these do.
 * ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.hpp
 * ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.cpp
 * ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
-* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
+* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.cpp


Thanks for fixing this documentation. This is a good improvement.

grhoten · 2025-06-12T06:13:43Z

documents/how_to_add_new_language.md

@@ -11,6 +11,8 @@ NOTE: Take a look at [PR #40](https://github.com/unicode-org/inflection/pull/40)
 In general, to bootstrap your progress look for grammatically similar language that's already supported, e.g. if you are adding Serbian look for existing Russian implementation.
 This will help you find most of the files you need to add/change and will speed up implementation of the rules and lexicons.

+Before you add new language support, go to the README.md in the inflection subfolder (inflection/inflection/README.md), build the project, and make sure all the tests run on your computer.


Good suggestion!

grhoten · 2025-06-16T17:37:05Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.hpp

+#include <vector>
+#include <map>
+
+class inflection::grammar::synthesis::MlGrammarSynthesizer_MlDisplayFunction : public virtual ::inflection::dialog::DefaultDisplayFunction {


I recommend looking at EnGrammarSynthesizer_EnDisplayFunction, EsGrammarSynthesizer_EsDisplayFunction or DeGrammarSynthesizer_DeDisplayFunction for how to use DictionaryLookupInflector/dictionaryInflector. The German one uses grammatical cases, which will need to be extended to what Malayalam uses. The English and Spanish versions also include some examples of how to guess the singular and plural forms of nouns.

deonajulary06 added 17 commits May 27, 2025 10:16

Add Malayalam dictionary support

e359163

Add Malayalam language support in LocaleUtils.hpp

a109f4e

Add Malayalam locale support to LocaleUtils

9fd9e01

Add Malayalam language to the tests

72e7c28

Add Malayalam locale group ml_IN

95b483d

ADD: Malayalam tokenizer configuration file

681b568

Inflection-85: Add Git LFS config for Malayalam dictionary and XML files

bfa7ae6

Add Malayalam inflection and pronoun tests

5aa7ec4

Updated copyright line

f20350f

Updated copyright message

9cc0a44

Updated copright message

4bb9964

Updated language grammar to include Malayalam

a46c301

Added pronouns for Malayalam

df82a8b

Add ll GrammarSynthesizer files

8fcfb16

Add Malayalam grammar synthesizer

0c50978

Add Malayalam-specific CommonConceptFactory with lists and quantities

9523646

Update document on how to add a new language, fixed errors

a8a7f2d

grhoten reviewed Jun 12, 2025

View reviewed changes

grhoten reviewed Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update new language doc #126

Update new language doc #126

Uh oh!

deonajulary06 commented Jun 10, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 12, 2025

Uh oh!

grhoten Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Update new language doc #126

Are you sure you want to change the base?

Update new language doc #126

Uh oh!

Conversation

deonajulary06 commented Jun 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!