Skip to content

Conversation

jwinter3
Copy link

No description provided.

bscisel and others added 12 commits May 20, 2025 19:22
- Updated `quality_stats.py` to support FASTQ file format in base sequence quality function.
- Refactored the scanning and frame functions to use new UDAF for base sequence quality calculations.
- Implemented a new UDAF in `udaf.rs` to compute quality scores statistics, including average, median, and quartiles.
- Modified `context.rs` to register the new UDAF for use in SQL queries.
- Adjusted `operation.rs` to execute the new SQL query for base sequence quality analysis.
- Added deregistration functionality for tables in `scan.rs`.
- Ensured compatibility with FASTQ format in input handling.
- Updated `base_sequence_quality` function to accept a quality scores column and output type.
- Introduced `BaseSequenceQualityProvider` and `BaseSequenceQualityExec` in Rust for efficient execution plans.
- Removed the custom UDAF for quality scores and replaced it with a DataFusion table provider.
- Simplified data handling by directly using DataFrames from DataFusion.
- Cleaned up unnecessary code and files related to UDAF implementation.
- Enhanced error handling and type checking for input data.
- Added `SequenceQualityHistogramProvider` and `SequenceQualityHistogramExec` to compute quality histograms from sequence data.
- Introduced `QuantileStatsTableProvider` and `QuantileStatsExec` for calculating quantile statistics based on histogram data.
@jwinter3 jwinter3 changed the title Bese sequence quality Z6 Base sequence quality Z6 Jun 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants