Open Call project, 2025 sci-rocket

Genome processing pipeline and dashboard

Total developer time: 3 months

Contact person: Jun.-Prof. Dr. Lauren Saunders, Centre for Organismal Studies

Outline

The group have developed an open source Snakemake pipeline that successfully processes sci-RNA-seq data but suffers from specific bugs and workflow limitations that impact research productivity and data reliability. Goals of the open call project include fixing bugs and inconsistencies, improving user experience, and expanding the pipeline to support additional data sources and analysis modes.

Results

New Features

  • Sustainability
    • Added full CI test coverage (unit + integation workflow tests)
    • Added playwright UI tests of the dashboard
    • Added a workflow to generate test data for CI
  • Sample sheet editor
    • Created an online sample sheet editor
  • Workflow
    • Unified sample sheet input to path_reads
    • Added AVITI read support
    • Added support for multiple path_reads entries
    • Added support for RNA velocity
    • Allow sequencer lanes to be specified
    • Automatically determine --sjdbOverhang parameter from data
    • Added conda env pin for linux
  • Dashboard
    • Added “Copy barcodes” buttons
    • Added benchmark/job-run info to dashboard
    • Extended hashing summary in dashboard
    • Fix reverse-conjugated p5 to match input data in dashboard

Bug Fixes

  • Fixed double counting of successful paired reads
  • Made hash/uncorrectable ordering deterministic
  • Fixed uncorrectable barcode reduction bug
  • Dashboard minor fixes
  • Fixed empty discarded-barcode logs
  • Fixed zero upstream counts in hashing table

Refactoring

  • Explicit dir_output path handling (instead of relying on working dir)
  • Minor rule refactor to named inputs/outputs

sci-rocket