@@ -76,40 +76,31 @@ feature object ``X``. Specifically, instead of a `pandas.DataFrame`, ``X`` must
7676specifies the dataset schema in the following way::
7777
7878 X = {
79- "main_table": <name of the main table>,
80- "tables" : {
81- <name of the main table>: (<dataframe of the main table>, <key of the main table>),
82- <name of table 1>: (<dataframe of table 1>, <key of table 1>),
83- <name of table 2>: (<dataframe of table 2>, <key of table 2>),
79+ "main_table": (<dataframe of the main table>, <key of the main table>),
80+ "additional_data_tables" : {
81+ <data path to table 1>: (
82+ <dataframe of table 1>, [<key of table 1>], <optional entity flag>
83+ ),
84+ <data path to table 2>: (
85+ <dataframe of table 2>, [<key of table 2>], <optional entity flag>
86+ ),
8487 ...
8588 }
86- "relations" : [
87- (<name of the main table>, <name of a different table>, <entity flag>),
88- (<name of another table>, <name of yet another table>, <entity flag>),
89- ...
90- ],
9189 }
9290
9391The three fields of this dictionary are:
9492
95- - ``main_table ``: The name of the main table.
96- - ``tables ``: A dictionary indexed by the tables' names. Each table is associated to a 2-tuple
97- containing the following fields:
93+ - ``main_table ``: a 2-tuple containing the following fields:
94+ - The `pandas.DataFrame ` object of the main table.
95+ - The key columns' names: A list of strings.
96+ .
97+ - ``additional_data_tables ``: A dictionary indexed by the data paths to the secondary
98+ tables. Each data path is associated to a 2-tuple containing the following fields:
9899
99- - The `pandas.DataFrame ` object of the table.
100- - The key columns' names : Either a list of strings or a single string.
101-
102- - ``relations ``: An optional field containing a list of tuples describing the relations between
103- tables. The first two values (Strings) of each tuple correspond to names of both the parent and the child table
104- involved in the relation. A third value (Boolean) can be optionally added to the tuple to indicate if the relation is
105- either ``1:n `` or ``1:1 `` (entity). For example, If the tuple ``(table1, table2, True) `` is contained in this
106- field, it means that:
107-
108- - ``table1 `` and ``table2 `` are in a ``1:1 `` relationship
109- - The key of ``table1 `` is contained in that of ``table2 `` (ie. keys are hierarchical)
110-
111- If the ``relations `` field is not present then Khiops Python assumes that the tables are in a *star *
112- schema.
100+ - The `pandas.DataFrame ` object of the secondary table.
101+ - The key columns' names : A list of strings.
102+ - optionally, a flag which indicates if the secondary table is in
103+ a ``1:1 `` relationship to its parent table.
113104
114105.. note ::
115106
@@ -138,9 +129,8 @@ We build the input ``X`` as follows::
138129 accidents_df = pd.read_csv(f"{kh.get_samples_dir()}/AccidentsSummary/Accidents.txt", sep="\t")
139130 vehicles_df = pd.read_csv(f"{kh.get_samples_dir()}/AccidentsSummary/Vehicles.txt", sep="\t")
140131 X = {
141- "main_table" : "Accident",
142- "tables": {
143- "Accident": (accidents_df.drop("Gravity", axis=1), "AccidentId"),
132+ "main_table" : (accidents_df.drop("Gravity", axis=1), ["AccidentId"]),
133+ "additional_data_tables": {
144134 "Vehicle": (vehicles_df, ["AccidentId", "VehicleId"])
145135 }
146136 }
@@ -170,19 +160,12 @@ We build the input ``X`` as follows::
170160 places_df = pd.read_csv(f"{kh.get_samples_dir()}/Accidents/Places.txt", sep="\t")
171161
172162 X = {
173- "main_table": "Accidents",
174- "tables": {
175- "Accidents": (accidents_df.drop("Gravity", axis=1), "AccidentId"),
163+ "main_table": (accidents_df.drop("Gravity", axis=1), ["AccidentId"]),
164+ "additional_data_tables": {
176165 "Vehicles": (vehicles_df, ["AccidentId", "VehicleId"]),
177- "Users": (users_df, ["AccidentId", "VehicleId"]),
178- "Places": (places_df, "AccidentId"),
179-
166+ "Vehicles/Users": (users_df, ["AccidentId", "VehicleId"]),
167+ "Places": (places_df, ["AccidentId"], True),
180168 },
181- "relations": [
182- ("Accidents", "Vehicles"),
183- ("Vehicles", "Users"),
184- ("Accidents", "Places", True),
185- ],
186169 }
187170
188171Both datasets can be found in the Khiops samples directory.
0 commit comments