A parallel DICOM crawler for extracting specific metadata (such as patient ID, image laterality, etc.) and writing the results to a CSV file
A CLI tool for crawling DICOM and Crystal-Eye files and extracting metadata to a CSV file
Usage: open-sight [OPTIONS] <FOLDER_PATHS>...
Arguments:
<FOLDER_PATHS>...
Options:
-c, --csv-out <CSV_OUT> [default: open_sight_results.csv]
-n, --num-jobs <NUM_JOBS> [default: 1]
-o, --overwrite
-b, --batch-size <BATCH_SIZE> [default: 50]
-h, --help Print help
-V, --version Print versionCopy DICOM files and Crystal-Eye files based on patient IDs
Usage: copy_src [OPTIONS] <PATIENT_ID_FILE> <OUTPUT_DIRECTORY>
Arguments:
<PATIENT_ID_FILE> File containing patient IDs
<OUTPUT_DIRECTORY> Directory to store copied files
Options:
-o, --overwrite Whether to overwrite existing files
-d, --database <DATABASE> Database file to use [default: open_sight.duckdb]
-h, --help Print helpRun in a terminal:
duckdb open_sight.duckdbThen run these commands in the duckdb terminal, assuming all.csv was the file created by open-sight:
CREATE TABLE open_sight (
patient_id VARCHAR,
patient_name VARCHAR,
laterality VARCHAR,
sex VARCHAR,
dob DATE,
scan_date DATE,
modality VARCHAR,
manufacturer VARCHAR,
series_description VARCHAR,
modified TIMESTAMP,
file_size BIGINT,
file_path VARCHAR PRIMARY KEY
);
CREATE UNIQUE INDEX idx_file_path ON open_sight ("file_path");
INSERT INTO open_sight
SELECT DISTINCT *
FROM read_csv_auto('all.csv') AS csv
WHERE NOT EXISTS (
SELECT 1
FROM open_sight
WHERE open_sight.file_path = csv.file_path
);If just updating the DB, just run:
INSERT INTO open_sight
SELECT DISTINCT *
FROM read_csv_auto('all.csv') AS csv
WHERE NOT EXISTS (
SELECT 1
FROM open_sight
WHERE open_sight.file_path = csv.file_path
);
-- To get the new totals
select count(*) from open_sight;
-- Some basic table analysis
SELECT * FROM information_schema.tables WHERE table_schema = 'main';
SELECT * FROM duckdb_indexes();
SELECT * FROM duckdb_constraints();
SELECT * FROM duckdb_tables();Crawling DICOM (or proprietary files if crystal-eye is present) files and saving results to a CSV file
_input_folder_: a folder containing DICOM files in no matter folder structure, with subfolders etc._csv_file_: a CSV file where the results will be saved; if given a previous populated one, data already parsed will be skipped.
open-sight _input_folder_/* -c _csv_file_ 2>&1 | tee output.logpatient_ids.txt: a simple file containing the patient_ids in rows._output_folder_: the folder where the files will be copied.
copy_src patient_ids.txt /_output_folder_ -d open_sight.duckdbBump the version number by running cargo v [part] where [part] is major, minor, or patch, depending on which part of the version number you want to bump.
cargo install cargo-v
# commit
cargo v patch -y #
# push
cargo build --release -j 10
git push origin --tags- 0.3.4
- Sanitised version, removed all the unnecessary files and infos, safe-guarding privacy
- 0.3.3
- Renamed to
copy_srcandcopy_src_csv
- Renamed to
- 0.3.2
- Updated
copy_dcmsto use updated database format
- Updated
- 0.3.1
- Fixed a bug where DCM need to be checked first, then use
crystal-eye
- Fixed a bug where DCM need to be checked first, then use
- 0.3.0
- Updated
duckdbtov1.0.0 - Ability to reuse the CSV to skip already processed files
- Updated
- 0.2.1
- Extend support to all extensions handled by
crystal-eye:e2e,fdaandsdb
- Extend support to all extensions handled by
- 0.2.0
- Added
E2Esupport viacrystal-eye
- Added
- 0.1.6
- Added
copy_dcmsto replacefind_patidandcopy_dcms_csv
- Added
- 0.1.5
- Changed
file_sizeto u64 type and representingbytes
- Changed
- 0.1.4
- Renamed the table headers to lowercase with underscore instead of spaces
- 0.1.3
- Introduced
find_patid - Refactored code to use
helpers.rs
- Introduced
- 0.1.2
- Reverted
path::absolute, keep Windows file path way
- Reverted
- 0.1.1
- Able to use glob
- Retry routine for failed DCM during parsing
- Using experimental
path::absoluteto properly render Windows full path strings