Export CSV To Influx: Process CSV data, and export the data to influx db
Use the pip to install the library. Then the binary export_csv_to_influx is ready.
pip install ExportCsvToInflux
- [Highlight 🌟🎉😍] Allow to use binary export_csv_to_influx to run exporter
- [Highlight 🌟🎉😍] Allow to check dozens of csv files in a folder
- [Highlight 🌟🎉😍🎊🍀🎈] Auto convert csv data to int/float/string in Influx
- [Highlight 🌟🎉😍] Allow to match or filter the data by using string or regex.
- [Highlight 🌟🎉😍] Allow to count, and generate count measurement
- Allow to limit string length in Influx
- Allow to judge the csv has new data or not
- Allow to use the latest file modify time as time column
- Auto Create database if not exist
- Allow to drop database before inserting data
- Allow to drop measurements before inserting data
You could use export_csv_to_influx -h to see the help guide.
-c, --csv: Input CSV file path, or the folder path. Mandatory
-db, --dbname: InfluxDB Database name. Mandatory
-m, --measurement: Measurement name. Mandatory
-fc, --field_columns: List of csv columns to use as fields, separated by comma. Mandatory
-d, --delimiter: CSV delimiter. Default: ','.
-lt, --lineterminator: CSV lineterminator. Default: '\n'.
-s, --server: InfluxDB Server address. Default: localhost:8086.
-u, --user: InfluxDB User name. Default: admin
-p, --password: InfluxDB Password. Default: admin
-t, --time_column: Timestamp column name. Default column name: timestamp.
If no timestamp column, the timestamp is set to the last file modify time for whole csv rows.
Note: Also support the pure timestamp, like: 1517587275. Auto detected.
-tf, --time_format: Timestamp format. Default: '%Y-%m-%d %H:%M:%S' e.g.: 1970-01-01 00:00:00.
-tz, --time_zone: Timezone of supplied data. Default: UTC.
-tc, --tag_columns: List of csv columns to use as tags, separated by comma. Default: None
-b, --batch_size: Batch size when inserting data to influx. Default: 500.
-lslc, --limit_string_length_columns: Limit string length column, separated by comma. Default: None.
-ls, --limit_length: Limit length. Default: 20.
-dd, --drop_database: Drop database before inserting data. Default: False.
-dm, --drop_measurement: Drop measurement before inserting data. Default: False.
-mc, --match_columns: Match the data you want to get for certain columns, separated by comma. Match Rule: All matches, then match. Default: None.
-mbs, --match_by_string: Match by string, separated by comma. Default: None.
-mbr, --match_by_regex: Match by regex, separated by comma. Default: None.
-fic, --filter_columns: Filter the data you want to filter for certain columns, separated by comma. Filter Rule: Any one filter success, the filter. Default: None.
-fibs, --filter_by_string: Filter by string, separated by comma. Default: None.
-fibr, --filter_by_regex: Filter by regex, separated by comma. Default: None.
-ecm, --enable_count_measurement: Enable count measurement. Default: False.
-fi, --force_insert_even_csv_no_update: Force insert data to influx, even csv no update. Default: False.
-fsc, --force_string_columns: Force columns as string type, seperated as comma. Default: None
-fintc, --force_int_columns: Force columns as int type, seperated as comma. Default: None
-ffc, --force_float_columns: Force columns as float type, seperated as comma. Default: None
Note:
- You could pass
*to --field_columns to match all the fields:--field_columns=*,--field_columns '*'- CSV data won't insert into influx again if no update. Use to force insert:
--force_insert_even_csv_no_update=True,--force_insert_even_csv_no_update True- If some csv cells have no value, auto fill the influx db based on column data type:
int: -999,float: -999.0,string: -
Also, we could run the exporter programmatically.
from ExportCsvToInflux import ExporterObject
exporter = ExporterObject()
exporter.export_csv_to_influx(...)
# You could get the export_csv_to_influx parameter details by:
print(exporter.export_csv_to_influx.__doc__)
Here is the demo.csv.
timestamp,url,response_time
2019-07-11 02:04:05,https://jmeter.apache.org/,1.434
2019-07-11 02:04:06,https://jmeter.apache.org/,2.434
2019-07-11 02:04:07,https://jmeter.apache.org/,1.200
2019-07-11 02:04:08,https://jmeter.apache.org/,1.675
2019-07-11 02:04:09,https://jmeter.apache.org/,2.265
2019-07-11 02:04:10,https://sample-demo.org/,1.430
2019-07-12 08:54:13,https://sample-show.org/,1.300
2019-07-12 14:06:00,https://sample-7.org/,1.289
2019-07-12 18:45:34,https://sample-8.org/,2.876
-
Command to export whole data into influx:
export_csv_to_influx \ --csv demo.csv \ --dbname demo \ --measurement demo \ --tag_columns url \ --field_columns response_time \ --user admin \ --password admin \ --force_insert_even_csv_no_update True \ --server 127.0.0.1:8086 -
Command to export whole data into influx, but: drop database
export_csv_to_influx \ --csv demo.csv \ --dbname demo \ --measurement demo \ --tag_columns url \ --field_columns response_time \ --user admin \ --password admin \ --server 127.0.0.1:8086 \ --force_insert_even_csv_no_update True \ --drop_database=True -
Command to export part of data: timestamp matches 2019-07-12 and url matches sample-\d+
export_csv_to_influx \ --csv demo.csv \ --dbname demo \ --measurement demo \ --tag_columns url \ --field_columns response_time \ --user admin \ --password admin \ --server 127.0.0.1:8086 \ --drop_database=True \ --force_insert_even_csv_no_update True \ --match_columns=timestamp,url \ --match_by_reg='2019-07-12,sample-\d+' -
Filter part of data, and the export into influx: url filter sample
export_csv_to_influx \ --csv demo.csv \ --dbname demo \ --measurement demo \ --tag_columns url \ --field_columns response_time \ --user admin \ --password admin \ --server 127.0.0.1:8086 \ --drop_database True \ --force_insert_even_csv_no_update True \ --filter_columns timestamp,url \ --filter_by_reg 'sample' -
Enable count measurement. A new measurement named: demo.count generated, with match: timestamp matches 2019-07-12 and url matches sample-\d+
export_csv_to_influx \ --csv demo.csv \ --dbname demo \ --measurement demo \ --tag_columns url \ --field_columns response_time \ --user admin \ --password admin \ --server 127.0.0.1:8086 \ --drop_database True \ --force_insert_even_csv_no_update True \ --match_columns timestamp,url \ --match_by_reg '2019-07-12,sample-\d+' \ --enable_count_measurement TrueThe count measurement is:
select * from "demo.count" name: demo.count time match_timestamp match_url total ---- --------------- --------- ----- 1562957134000000000 3 2 9
The lib is inspired by: https://github.com/fabio-miranda/csv-to-influxdb