-
Notifications
You must be signed in to change notification settings - Fork 0
TC_180502
Attendees: Viktoria Dorfer (VD), Pieter-Jan Volders (PV), Gerhard Duernberger (GD), Eric Deutsch (ED), Mathias Wilhelm (MW)
We will initially focus on generating a reference dataset to compare and evaluate FDR estimation approaches in the context of decoy generation methods in spectral library matching. We might later create more complex datasets containing e.g. synthetic decoys or isobaric permutations (for sensitivity/selectivity) or specific sets for evaluating site localization or mixed acquisition modes (either in SL or dataset). We aim to generate a dataset containing spectra following a uniform distribution with regard to e.g. length, signal to noise (missed cleavages, precursor charge) to avoid following (likely biased) distributions created by database searching strategies (we know that DBs have trouble identifying shorter peptides). The created dataset consists of a synthetically generated “experimental” dataset (ER) containing one spectrum per precursor, a collection of many reference spectra per precursor (no intersection to ER) which can be used to build spectral libraries (SR) and at least one reference spectral library (SL). These three dataset will enable the evaluation of a) spectral library building b) decoy generation methods and c) scoring methods. Initially, we will create a reference dataset which contains ~10k precursors in ER and SR whereas ~5k will be shared between them.
- ED will setup github and ftp for code and data
- MW responsible for reference set creation in form of “synthetic” mzML files. Aims to provide scripts so we can generate sets at will (varying size, composition, acquisition modes). First set (HCD, CE30, OT-MS) available latest 3 weeks after ASMS.
- ED will use SpectraST for initial SL generation (share with all)
- ED will evaluate SpectraST
- VD will use internal pipeline with access to various search engines for evaluation of decoy generation
- PV will evaluate internal newly developed SL search engine
- MW will use internal spectrum prediction tool as alternative for decoy generation
- Can/Should we make use of “entrapment FDR” as utilized by Lukas Kaell?
- For mzML generation: a) Do we need corresponding MS1 scans? b) What spectrum meta-data do we need? (i.e. are peaklists and precursor/scan information sufficient)