Skip to content

lsaffre/commondata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The commondata package

Share structured common data in a pythonic way.

The source code of the modules in this package are generated by the make_code.py script, which queries miscellaneous sources.

The library just provides pure data, it does not feature any querying or rendering functionality. This data is meant to be imported into existing systems that use their own preferences for rendering and querying data. This is a design choice.

Online version of this document on https://github.com/lsaffre/commondata

>>> from commondata.countries import COUNTRIES, FIELDS
>>> len(COUNTRIES)
195

These are the countries of the world:

>>> lst = ["{} ({})".format(c.name['en'], c.isoCode2) for c in COUNTRIES]
>>> txt = ", ".join(lst)
>>> from textwrap import fill
>>> print(fill(txt, width=78))  #doctest: +REPORT_UDIFF +NORMALIZE_WHITESPACE
Andorra (AD), United Arab Emirates (AE), Afghanistan (AF), Antigua and Barbuda
(AG), Albania (AL), Armenia (AM), Angola (AO), Argentina (AR), Austria (AT),
Australia (AU), Azerbaijan (AZ), Bosnia and Herzegovina (BA), Barbados (BB),
Bangladesh (BD), Belgium (BE), Burkina Faso (BF), Bulgaria (BG), Bahrain (BH),
Burundi (BI), Benin (BJ), Brunei (BN), Bolivia (BO), Brazil (BR), The Bahamas
(BS), Bhutan (BT), Botswana (BW), Belarus (BY), Belize (BZ), Canada (CA),
Democratic Republic of the Congo (CD), Central African Republic (CF), Republic
of the Congo (CG), Switzerland (CH), Ivory Coast (CI), Chile (CL), Cameroon
(CM), People's Republic of China (CN), Colombia (CO), Costa Rica (CR), Cuba
(CU), Cape Verde (CV), Cyprus (CY), Czech Republic (CZ), Germany (DE),
Djibouti (DJ), Dominica (DM), Dominican Republic (DO), Algeria (DZ), Ecuador
(EC), Estonia (EE), Egypt (EG), Eritrea (ER), Spain (ES), Ethiopia (ET),
Finland (FI), Fiji (FJ), Federated States of Micronesia (FM), France (FR),
Gabon (GA), United Kingdom (GB), Grenada (GD), Georgia (GE), Ghana (GH), The
Gambia (GM), Guinea (GN), Equatorial Guinea (GQ), Greece (GR), Guatemala (GT),
Guinea-Bissau (GW), Guyana (GY), Honduras (HN), Croatia (HR), Haiti (HT),
Hungary (HU), Indonesia (ID), Ireland (IE), Israel (IL), India (IN), Iraq
(IQ), Iran (IR), Iceland (IS), Italy (IT), Jamaica (JM), Jordan (JO), Japan
(JP), Kenya (KE), Kyrgyzstan (KG), Cambodia (KH), Kiribati (KI), Comoros (KM),
Saint Kitts and Nevis (KN), North Korea (KP), South Korea (KR), Kuwait (KW),
Kazakhstan (KZ), Laos (LA), Lebanon (LB), Saint Lucia (LC), Liechtenstein
(LI), Sri Lanka (LK), Liberia (LR), Lesotho (LS), Lithuania (LT), Luxembourg
(LU), Latvia (LV), Libya (LY), Morocco (MA), Monaco (MC), Moldova (MD),
Montenegro (ME), Madagascar (MG), Marshall Islands (MH), North Macedonia (MK),
Mali (ML), Myanmar (MM), Mongolia (MN), Mauritania (MR), Malta (MT), Mauritius
(MU), Maldives (MV), Malawi (MW), Mexico (MX), Malaysia (MY), Mozambique (MZ),
Namibia (NA), Niger (NE), Nigeria (NG), Nicaragua (NI), Kingdom of the
Netherlands (NL), Norway (NO), Nepal (NP), Nauru (NR), New Zealand (NZ), Oman
(OM), Panama (PA), Peru (PE), Papua New Guinea (PG), Philippines (PH),
Pakistan (PK), Poland (PL), Palestine (PS), Portugal (PT), Palau (PW),
Paraguay (PY), Qatar (QA), Romania (RO), Serbia (RS), Russia (RU), Rwanda
(RW), Saudi Arabia (SA), Solomon Islands (SB), Seychelles (SC), Sudan (SD),
Sweden (SE), Singapore (SG), Slovenia (SI), Slovakia (SK), Sierra Leone (SL),
San Marino (SM), Senegal (SN), Somalia (SO), Suriname (SR), South Sudan (SS),
São Tomé and Príncipe (ST), El Salvador (SV), Syria (SY), Eswatini (SZ), Chad
(TD), Togo (TG), Thailand (TH), Tajikistan (TJ), Timor-Leste (TL),
Turkmenistan (TM), Tunisia (TN), Tonga (TO), Turkey (TR), Trinidad and Tobago
(TT), Tuvalu (TV), Taiwan (TW), Tanzania (TZ), Ukraine (UA), Uganda (UG),
United States (US), Uruguay (UY), Uzbekistan (UZ), Vatican City (VA), Saint
Vincent and the Grenadines (VC), Venezuela (VE), Vietnam (VN), Vanuatu (VU),
Samoa (WS), Yemen (YE), South Africa (ZA), Zambia (ZM), Zimbabwe (ZW)

This is what we know about each country_

>>> FIELDS
('entity', 'name', 'isoCode2', 'isoCode3', 'zipCode', 'population')

Example:

>>> COUNTRIES[0]
Country(entity='Q228', name={'en': 'Andorra', 'de': 'Andorra', 'fr': 'Andorre', 'nl': 'Andorra', 'et': 'Andorra', 'bn': 'অ্যান্ডোরা', 'es': 'Andorra'}, isoCode2='AD', isoCode3='AND', zipCode=None, population='87097')

The COUNTRY2SCHEME dict in the commondata.peppol module maps country codes to the Participant Identifier Scheme of their respective VAT office. The make_code.py gets this data from https://docs.peppol.eu/edelivery/codelists

>>> from commondata.peppolcodes import COUNTRY2SCHEME
>>> COUNTRY2SCHEME['BE']
'9925'
>>> COUNTRY2SCHEME['EE']
'9931'

Not every country has an Electronic Address Scheme:

>>> COUNTRY2SCHEME['US']
Traceback (most recent call last):
...
KeyError: 'US'

Here is a list of the Peppol countries:

>>> " ".join(sorted(COUNTRY2SCHEME.keys()))
'AD AL AT BA BE BG CH CY CZ DE EE ES FI FR GB GR HR HU IE IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR VA international'

This is used by Lino, see https://dev.lino-framework.org/topics/peppol.html#electronic-address-scheme

The following snippet was used to generate the :file:`NAT2EAS.DBC` file used by TIM:

>>> for k in sorted(COUNTRY2SCHEME.keys()):
...     print(f"{COUNTRY2SCHEME[k]}|{k}")
9922|AD
9923|AL
9914|AT
9924|BA
9925|BE
9926|BG
9927|CH
9928|CY
9929|CZ
9930|DE
9931|EE
9920|ES
0213|FI
9957|FR
9932|GB
9933|GR
9934|HR
9910|HU
9935|IE
9906|IT
9936|LI
9937|LT
9938|LU
9939|LV
9940|MC
9941|ME
9942|MK
9943|MT
9944|NL
9909|NO
9945|PL
9946|PT
9947|RO
9948|RS
9955|SE
9949|SI
9950|SK
9951|SM
9952|TR
9953|VA
9912|international

The DELIVERY_UNITS dict in the commondata.peppol module contains the codes that are allowed in the unitCode attribute of a InvoicedQuantity element. These codes are specified by UNECERec20.

The make_code.py gets this data from the OpenPEPPOL repository.

>>> from commondata.peppolcodes import DELIVERY_UNITS

The DELIVERY_UNITS dict contains many codes:

>>> len(DELIVERY_UNITS)
2162

And some of them are funny:

>>> DELIVERY_UNITS['14']
('shot', 'A unit of liquid measure, especially related to spirits.')

I wondered what's the code for "hour":

>>> for k, v in DELIVERY_UNITS.items():
...     if v[0].lower() == "hour":
...         print(k)
HUR

Here are some of the more commonly used units:

>>> for i in "HUR MIN MON LTR CLT DLT KGM XPP XPK XBX MTR MTK MTQ B68".split():
...     print(i, DELIVERY_UNITS[i][0])
HUR hour
MIN minute [unit of time]
MON month
LTR litre
CLT centilitre
DLT decilitre
KGM kilogram
XPP Piece
XPK Package
XBX Box
MTR metre
MTK square metre
MTQ cubic metre
B68 gigabit
>>> from commondata.places.estonia import PLACES, COUNTIES
>>> len(PLACES)
4564
>>> len(COUNTIES)
15
>>> for county in COUNTIES:
...    print(county.name, ":", ", ".join([p.name for p in county.children]))
Harju : Tallinn, Ääsmäe, Loksa, Vasalemma, Nissi, Saku, Saue, Viimsi, Raasiku, Jõelähtme, Maardu, Rae, Harku, Keila, Anija, Kehra, Kiili, Paldiski, Kose, Padise, Kõue, Kuusalu, Kernu, Aegviidu, Kaasiku, Kibuna, Vahastu, Vansi, Vikipalu, Jägala-Joa, Kersalu, Haapse, Jõesuu, Pohla, Andineeme
Pärnu : Pärnu, Halinga, Tootsi, Vändra, Tori, Tõstamaa, Tahkuranna, Sauga, Paikuse, Sindi, Audru, Häädemeeste, Kilingi-Nõmme, Are, Lavassaare, Varbla, Saarde, Surju, Kihnu, Koonga, Metsaääre, Aruvälja
Rapla : Vigala, Rapla, Kehtna, Märjamaa, Järvakandi, Juuru, Kaiu, Käru, Kohila, Raikküla
Hiiu : Kärdla, Käina, Kõrgessaare, Pühalepa, Emmaste
Ida-Viru : Lohusuu, Sonda, Toila, Tudulinna, Sillamäe, Püssi, Lüganuse, Vaivara, Narva, Avinurme, Narva-Jõesuu, Kohtla-Järve, Aseri, Jõhvi, Iisaku, Kiviõli, Alajõe, Kohtla-Nõmme, Maidla, Mäetaguse, Kohtla, Illuka
Jõgeva : Torma, Põltsamaa, Tabivere, Mustvee, Jõgeva, Palamuse, Puurmani, Saare, Kasepää, Pajusi, Pala, Vägeva
Järva : Türi, Roosna-Alliku, Paide, Väätsa, Ambla, Järva-Jaani, Koeru, Kareda, Albu, Imavere, Koigi, Kolu
Lääne : Lihula, Risti, Ridala, Haapsalu, Hanila, Taebla, Oru, Vormsi, Martna, Noarootsi, Nõva, Kullamaa
Lääne-Viru : Tapa, Rakvere, Vinni, Tamsalu, Rakke, Väike-Maarja, Sõmeru, Vihula, Haljala, Kunda, Kadrina, Laekvere, Viru-Nigula, Eisma
Põlva : Räpina, Põlva, Veriora, Kanepi, Ahja, Kõlleste, Vastse-Kuuste, Värska, Mikitamäe, Mooste, Orava, Valgjärve, Laheda
Saare : Leisi, Salme, Kaarma, Orissaare, Kärla, Kihelkonna, Kuressaare, Valjala, Lümanda, Pöide, Pihtla, Torgu, Mustjala, Laimjala, Muhu, Ruhnu
Tartu : Tartu, Luunja, Ülenurme, Haaslava, Rõngu, Kambja, Elva, Nõo, Kallaste, Puhja, Alatskivi, Mäksa, Tähtvere, Konguta, Rannu, Laeva, Võnnu, Peipsiääre, Meeksi, Vara, Piirissaare, Vehendi, Kriimani, Illi, Neemisküla
Valga : Valga, Tõrva, Otepää, Puka, Õru, Tõlliste, Sangaste, Karula, Helme, Taheva, Põdrala, Palupera, Hummuli
Viljandi : Suure-Jaani, Abja, Abja-Paluoja, Viljandi, Võhma, Mõisaküla, Viiratsi, Halliste, Karksi, Karksi-Nuia, Kolga-Jaani, Pärsti, Tarvastu, Saarepeedi, Paistu, Kõpu, Kõo, Soe
Võru : Vastseliina, Võru, Antsla, Varstu, Sõmerpalu, Rõuge, Mõniste, Haanja, Urvaste, Lasva, Misso, Meremäe, Kirumpää, Navi, Meegomäe

Note: The data about Estonian places is currently obsolete by several years. We plan to maintain it in collaboration with https://maaamet.ee/ruumiandmed-ja-kaardid/aadressid-ja-kohanimed/kohanimeregister

Until March 2024 this was a namespace package and country-specific data was contained in individual subpackages. The following packages are now obsolete

How to uninstall the old commondata packages: find your site-packages directory (e.g. ~/env/lib/python3.10/site-packages) and manually remove all files commondata*-nspkg.pth

The remaining part of this document is obsolete but still valid.

How to use the Place and PlaceGenerator classes.

You define a subclass of Place for each "type" of place:

>>> from commondata.utils import Place, PlaceGenerator
>>> class PlaceInFoo(Place):
...     def __str__(self):
...        return self.name
>>> class Kingdom(PlaceInFoo):
...     value = 1
>>> class County(PlaceInFoo):
...     value = 2
>>> class Borough(PlaceInFoo):
...     value = 3
>>> class Village(PlaceInFoo):
...     value = 3

The PlaceGenerator is used to instantiate to populate

Part 1 : configuration:

>>> pg = PlaceGenerator()
>>> pg.install(Kingdom, County, Borough, Village)
>>> pg.set_args('name')

Part 2 : filling data

>>> root = pg.kingdom("Kwargia")
>>> def fill(pg):
...    pg.county("Kwargia")
...    pg.borough("Kwargia")
...    pg.village("Virts")
...    pg.village("Vinks")
...    pg.county("Gorgia")
...    pg.village("Girts")
...    pg.village("Ginks")
>>> fill(pg)

Part 3 : using the data

>>> [str(x) for x in root.children]
['Kwargia', 'Gorgia']
>>> kwargia = root.children[0]
>>> [str(x) for x in kwargia.children]
['Kwargia', 'Virts', 'Vinks']

Multilingual place names

You use the commondata.utils.PlaceGenerator.set_args() method to specify the names of the fields of subsequent places.

>>> pg = PlaceGenerator()
>>> pg.install(Kingdom, County, Borough, Village)
>>> pg.set_args('name name_ar')
>>> root = pg.kingdom("Egypt", u'مصر')
>>> print(root.name_ar)
مصر

2025-06-13 I wondered why Kosovo (XK) is not in our list. Seems that it is not marked as a sovereign_state in Wikidata. But after running make_docs.py I noticed that Bangladesh (BD) has vanished from the list. I ignore why. I don't plan to dig deeper into this because I believe we should rather deprecate this project and start using pycountries.

En passant I fixed a broken link for Peppol in make_docs.py.

About

The ``commondata`` namespace package

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages