WMS_TAXONOMY#
This Nextflow script implements a workflow for taxonomic analysis of whole metagenome sequencing data using Kraken2 and Bracken.
Workflow Execution#
# Activate metafun conda environment
conda activate metafun
# Move to metafun directory where you cloned metafun from GitHub
(metafun) nextflow run nf_scripts/<span style="color:#0846FA">WMS_TAXONOMY</span>_apptainer.nf --inputDir ${inputDir} --metadata ${metadata} --sampleIDcolumn ${column_number} --analysiscolumn ${column_number}
Workflow Overview#
The workflow performs the following steps:
Kraken2 taxonomic classification
Bracken abundance estimation
Phyloseq object creation from Bracken outputs
Statistical analysis and visualization based on metadata
Result of this workflow provides taxonomic profiles of metagenomic samples, which can be used for downstream comparative analyses.
Inputs and Outputs#
Inputs: Quality-controlled paired-end metagenomic reads (output from RAWREAD_QC workflow), metadata file. Outputs: Kraken2 reports, Bracken abundance estimates, Phyloseq object, statistical analysis results and visualizations.
Default output directory is ${launchDir}/results/metagenome/<span style="color:#0846FA">WMS_TAXONOMY</span>.
Process |
InputDir |
OutputDir |
Note |
|---|---|---|---|
kraken2_run |
|
|
Kraken2 classification results |
bracken_run |
Output from kraken2_run |
|
Bracken abundance estimates |
phyloseq_creation |
Output from bracken_run |
|
Phyloseq object creation |
statistical_analysis |
Phyloseq object |
|
Statistical analysis and visualizations |
Parameters in WMS_TAXONOMY Nextflow Script#
Parameter |
Description |
Default Value |
Note |
|---|---|---|---|
|
Base directory for databases |
|
Mounted path in apptainer environment |
|
Base directory for scripts |
|
Mounted path in apptainer environment |
|
Input directory containing filtered reads |
|
Output from RAWREAD_QC workflow |
|
Output directory |
|
|
|
Path to metadata file |
Required input |
|
|
Column number for sample IDs in metadata |
|
|
|
Column number for analysis grouping |
|
|
|
Relative abundance filter for Bracken results |
|
|
|
Number of CPUs to use |
|
|
|
Kraken2 method (‘default’ or ‘memory-mapping’) |
|
Descriptions of Processes in WMS_TAXONOMY Workflow#
Kraken2 Run: Performs taxonomic classification on each sample’s paired-end reads using Kraken2.
Bracken Run: Estimates abundances at the species level using Bracken, based on Kraken2 results.
Phyloseq Creation: Creates a Phyloseq object from Bracken outputs and metadata for downstream analysis.
Statistical Analysis: Conducts statistical analyses and creates visualizations based on the Phyloseq object and metadata.
Tools Used in WMS_TAXONOMY#
Tool |
Purpose |
Version |
Default parameters |
Parameters that can be selected |
|---|---|---|---|---|
Kraken2 |
Taxonomic classification |
2.1.2 |
|
|
Bracken |
Abundance estimation |
2.7 |
|
|
R (phyloseq) |
Statistical analysis and visualization |
4.3.2 |
N/A |
N/A |
Usage Notes#
The input directory should contain paired-end read files processed by the RAWREAD_QC workflow.
Metadata file should be in CSV format with at least two columns: sample IDs and grouping information for analysis.
The script checks for the existence and non-emptiness of the input directory before proceeding.
Kraken2 and Bracken databases should be properly set up in the specified database directory.
The
kraken_methodparameter allows for memory-mapping, which can improve performance on systems with sufficient RAM.
Output file details and examples#
The workflow generates the following outputs in the specified outdir:
Kraken2 classification reports for each sample
Bracken abundance estimates at the species level
Phyloseq object (RDS file) containing taxonomic profiles and metadata
Statistical analysis results and visualizations (e.g., alpha diversity plots, beta diversity ordinations, differential abundance analyses)