Skip to content

Add low biomass pipelines and update standard metagenomics pipeline#197

Open
bnovak32 wants to merge 48 commits intomasterfrom
DEV_Metagenomics_low_biomass
Open

Add low biomass pipelines and update standard metagenomics pipeline#197
bnovak32 wants to merge 48 commits intomasterfrom
DEV_Metagenomics_low_biomass

Conversation

@bnovak32
Copy link
Copy Markdown
Contributor

  • updated standard Illumina Metagenomics pipeline
    • replace bbduk with fastp
    • add steps that were missing from previous pipeline definition
  • two new metagenomics pipelines for low biomass (long-read (Nanopore) and short-read (Illumina))
    • includes additional pre-processing steps for read decontamination
    • decontamination steps in downstream analysis
  • add visualizations and downstream analysis to all pipelines
  • update all software packages to latest possible without changing results

bnovak32 and others added 30 commits September 8, 2025 09:43
- removed read-based processing from standard pipeline (no longer used,
  humann does not work with this data type)
First draft nanopore low biomass pipeline
* Update to latest draft
* Add read-based feature table decontamination steps
* Add taxonomy plots
* Regularize formatting
1. Fixed bugs with host tar command and the run_decontam R function.
2. Fixed bug with reading and writing species table
3. Fixed bug with running decontam and plotting non-contaminant features.
* Formatting updates
* Update step names
* Add missing steps
* Fix broken links
Fixed a typo in the fasta file name for Mycobacterium marinum.
Fixed typo in metaphlan version specification.
- Updated Long-read document to better match latest workflow
- Added first draft of Short-read document

TODO:
  - Fix Assembly-based decontamination and heatmaps
  - Finish Short-read document (only revised through pre-processing,
    which also requires some overhaul)
- Updated GL-DPPD-7117 to 1st draft status
- Fixed some links and formatting in GL-DPPD-7116

TODO: Add barplots and decontamination to read-based metaphlan
taxonomies
- started renaming fastq files to move assay suffix to end
- fixed typos in the table of contents
- Add missing steps (Read-based metaphlan taxonomies and assembly-based
  heatmaps)
- Fix documentation for kraken2-build (adding references to fasta
  acquisition)
- Update/fix table of contents
- Regularize formatting
- Fix typos
- remove Nanopore specific tools from software table
* Update GL-DPPD-7116.md through step 9c (set global variables).
* Fix numbering in GL-DPPD-7116.md
- sync changes between long and short read documents
asaravia-butler and others added 18 commits January 27, 2026 23:22
Sync with working implementation
- fixed spelling/typos across both documents
- Short-read specific updates
  - removed remove human reads step and added reference to remove human
    reads pipeline
  - add Humann output downstream analysis steps
- Long-read specific updates
  - Update human read removal to sync with latest human read removal pipeline and add link to that pipeline
- update kraken2 database build steps to use k2 wrapper (as in latest RHR and EHR pipelines/workflows)
- renumbered steps and fixed internal links
- updated thresholds to match latest implementation
- add missing assay suffixes and fix incorrect suffixes
- remove unused samtools index step in Assembly-based processing
- update output file names
- change all csv output to tsv
- updated software tables to remove references to unused software in each pipeline.
- fix broken links
- add missing filtering steps in Assembly-based processing
- Updated header information
- Add top50 heatmaps
- added note about decontaminated plots and species tables only being
  present when 1 or more contaminants found
- reorganize R code
- updated Metagenomics READMEs
- added Metagenomics workflow submodule for low biomass pipelines
- updated low biomass pipeline docs table-of-contents
- updated all numbering
- fixed broken link
- removed incorrect references to gzipped fasta
- sync the standard metagenomics pipeline doc to the updates in the low
  biomass pipeline docs.
- also fixes some typos found in the low biomass docs
fix document number typo
- Updated software versions to latest possible
- Specify separate tidyverse instead of the tidyverse collection for
  more granular software versions
- Add final pipeline approval date
- changed NF_Metagenomics to NF_MetagenomeSeq
- updated READMEs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants