WMS_TAXONOMY

WMS_TAXONOMY#

This Nextflow script implements a workflow for taxonomic analysis of whole metagenome sequencing data using Kraken2 and Bracken.

Workflow Execution#

WMS_TAXONOMY execution#

# Activate metafun conda environment
conda activate metafun
# Move to metafun directory where you cloned metafun from GitHub
(metafun) nextflow run nf_scripts/<span style="color:#0846FA">WMS_TAXONOMY</span>_apptainer.nf --inputDir ${inputDir} --metadata ${metadata} --sampleIDcolumn ${column_number} --analysiscolumn ${column_number}

Workflow Overview#

The workflow performs the following steps:

Kraken2 taxonomic classification
Bracken abundance estimation
Phyloseq object creation from Bracken outputs
Statistical analysis and visualization based on metadata

Result of this workflow provides taxonomic profiles of metagenomic samples, which can be used for downstream comparative analyses.

Inputs and Outputs#

Inputs: Quality-controlled paired-end metagenomic reads (output from RAWREAD_QC workflow), metadata file. Outputs: Kraken2 reports, Bracken abundance estimates, Phyloseq object, statistical analysis results and visualizations.

Default output directory is ${launchDir}/results/metagenome/<span style="color:#0846FA">WMS_TAXONOMY</span>.

Process	InputDir	OutputDir	Note
kraken2_run	`${params.inputDir}`	`${params.outdir}/kraken2`	Kraken2 classification results
bracken_run	Output from kraken2_run	`${params.outdir}/bracken`	Bracken abundance estimates
phyloseq_creation	Output from bracken_run	`${params.outdir}/phyloseq`	Phyloseq object creation
statistical_analysis	Phyloseq object	`${params.outdir}/stats_analysis`	Statistical analysis and visualizations

Parameters in WMS_TAXONOMY Nextflow Script#

Parameter	Description	Default Value	Note
`db_baseDir`	Base directory for databases	`/opt/database`	Mounted path in apptainer environment
`scripts_baseDir`	Base directory for scripts	`/scratch/tools/microbiome_analysis/scripts`	Mounted path in apptainer environment
`inputDir`	Input directory containing filtered reads	`"${launchDir}/results/metagenome/<span style="color:#FF0000">RAWREAD_QC</span>/read_filtered"`	Output from RAWREAD_QC workflow
`outdir`	Output directory	`"${launchDir}/results/metagenome/<span style="color:#0846FA">WMS_TAXONOMY</span>"`
`metadata`	Path to metadata file	Required input
`sampleIDcolumn`	Column number for sample IDs in metadata	`1`
`analysiscolumn`	Column number for analysis grouping	`0`
`relab_filter`	Relative abundance filter for Bracken results	`0.000001`
`cpus`	Number of CPUs to use	`15`
`kraken_method`	Kraken2 method (‘default’ or ‘memory-mapping’)	`'default'`

Descriptions of Processes in WMS_TAXONOMY Workflow#

Kraken2 Run: Performs taxonomic classification on each sample’s paired-end reads using Kraken2.
Bracken Run: Estimates abundances at the species level using Bracken, based on Kraken2 results.
Phyloseq Creation: Creates a Phyloseq object from Bracken outputs and metadata for downstream analysis.
Statistical Analysis: Conducts statistical analyses and creates visualizations based on the Phyloseq object and metadata.

Tools Used in WMS_TAXONOMY#

Tool	Purpose	Version	Default parameters	Parameters that can be selected
Kraken2	Taxonomic classification	2.1.2	`--confidence 0.25`, `--paired`	`--memory-mapping`
Bracken	Abundance estimation	2.7	`-l S` (species level)	`-r ${read_length}`
R (phyloseq)	Statistical analysis and visualization	4.3.2	N/A	N/A

Usage Notes#

The input directory should contain paired-end read files processed by the RAWREAD_QC workflow.
Metadata file should be in CSV format with at least two columns: sample IDs and grouping information for analysis.
The script checks for the existence and non-emptiness of the input directory before proceeding.
Kraken2 and Bracken databases should be properly set up in the specified database directory.
The kraken_method parameter allows for memory-mapping, which can improve performance on systems with sufficient RAM.

Output file details and examples#

The workflow generates the following outputs in the specified outdir:

Kraken2 classification reports for each sample
Bracken abundance estimates at the species level
Phyloseq object (RDS file) containing taxonomic profiles and metadata
Statistical analysis results and visualizations (e.g., alpha diversity plots, beta diversity ordinations, differential abundance analyses)