Skip to content

Conversation

joewandy
Copy link
Member

@joewandy joewandy commented Jun 10, 2025

PR to add demo using vimms to generate synthetic data so we can test the performance of a typical untargeted data processing pipeline in LC/MS metabolomics.

So we generate chemicals in various experimental setup, write their mzml files + corresponding ground truth. The mzml files are then passed into a pipeline that tries to infer back chemical identities from spectral data. Since it's synthetic data, we can easily evaluate pipeline performance for all these steps:

  1. Peak picking
  2. Alignment
  3. Group related peaks like isotopes and adducts together.
  4. Identification, by matching to standards / spectral library.

Still work in progress, do not merge yet!!

joewandy added 7 commits June 9, 2025 17:36
* Add skeleton untargeted pipeline directory with plan

* fix flake8 errors in generate chemicals script

* Add lint/test guidelines and minimal untargeted test

* Implement dataset setup script

* Add mzML generation step

* Tune chromatogram width

* Add ground truth table generation

* Add MGF export for simulated chemicals

* docs: mark simulation step complete
* Add metric reporting helper to pipeline

* Add linear column RT noise option

* Add RT drift models and use in demo pipeline

* Refactor drift models into main package

* Clean up drift and demo scripts
* Refactor output handling and add get_peak_data

* Refactor pipeline into classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant