WMS_FUNCTION#
This module is a part of metaFun pipeline, providing functional analysis of whole metagenome sequencing data using HUMAnN3.
Overview#
The WMS_FUNCTION module performs functional profiling of metagenomic samples to identify the metabolic pathways present in microbial communities. It utilizes HUMAnN3 to map reads to the UniRef90 protein database and MetaCyc pathway database, generating comprehensive functional profiles. The module integrates functional data with sample metadata to perform statistical analyses and visualizations, revealing the functional potential of microbiomes across different conditions.
Module Execution#
# Basic usage with quality-filtered reads
(metafun) metafun -module WMS_FUNCTION -i results/metagenome/RAWREAD_QC/read_filtered -m metadata.csv -s 1
# Include statistical analysis based on metadata column
(metafun) metafun -module WMS_FUNCTION -i results/metagenome/RAWREAD_QC/read_filtered -m metadata.csv -s 1 -a 2
# Specify custom output directory
(metafun) metafun -module WMS_FUNCTION -i results/metagenome/RAWREAD_QC/read_filtered -m metadata.csv -s 1 -a 2 -o custom_output_dir
# Allocate specific CPU resources
(metafun) metafun -module WMS_FUNCTION -i results/metagenome/RAWREAD_QC/read_filtered -m metadata.csv -s 1 -a 2 -p 24
Module Operation Sequence#
This module performs the following steps:
HUMAnN3 analysis on individual samples:
Merges paired-end reads for each sample
Performs nucleotide and translated searches against ChocoPhlAn and UniRef90 databases
Maps reads to MetaCyc pathway database
Generates gene family and pathway abundance profiles
Note on human reads: This module uses the output from RAWREAD_QC, which has already removed human DNA sequences. HUMAnN3 further handles any remaining host reads through its built-in taxonomic filtering using MetaPhlAn
HUMAnN3 output parsing and normalization:
Combines pathway abundance tables from all samples using
humann_join_tablesNormalizes to copies per million (CPM) using
humann_renorm_tablewith the-u cpmparameterCPM normalization adjusts for sequencing depth by scaling gene abundances to a sum of 1 million, allowing fair comparison between samples with different sequencing depths
The calculation is: (pathway_abundance / total_abundance) × 1,000,000
Separates stratified (organism-specific) and unstratified (community-level) abundance tables with
humann_split_stratified_table
Functional analysis and visualization:
Uses the R script
humann3_visualization.Rfrom scripts directory to perform comprehensive statistical analysisTakes unstratified pathway abundance table (
pathabund_join_renorm_cpm_unstratified.tsv) and metadata as inputPerforms statistical analysis based on metadata groupings:
Alpha diversity calculations for functional richness and evenness
Beta diversity analysis with PERMANOVA tests
Differential abundance analysis to identify pathways that differ between groups
Creates visualizations including:
Stacked barplots of pathway abundances by group (shown in ① of example figure)
Alpha and beta diversity boxplots with statistical comparisons (shown in ② of example figure)
Heatmaps of differentially abundant pathways (shown in ③ of example figure)
Differential abundance boxplots with statistical significance (shown in ④ of example figure)
Correlation analyses with numerical metadata variables (shown in ⑤ of example figure)
Parameters#
${launchDir} is the directory where you execute metaFun, and utilized as output base directory.
Parameter |
Description |
Default Value |
Note |
|---|---|---|---|
|
Input directory containing filtered reads |
|
Required. Output from RAWREAD_QC workflow |
|
Path to metadata file |
|
Required. CSV file with sample information |
|
Column number for sample IDs in metadata |
|
Required. Matches sample IDs in read filenames |
|
Column number for analysis grouping |
|
Optional. If set to 0, no statistical analysis is performed |
|
Number of CPUs to use |
|
Optional. Adjust based on your system capabilities |
|
Output directory |
|
Optional. Where results will be saved |
Inputs and Outputs#
Inputs#
Quality-controlled paired-end metagenomic reads (output from RAWREAD_QC workflow)
Metadata file (CSV format) with sample information and conditions
Outputs#
HUMAnN3 gene family and pathway abundance profiles for each sample
Combined and normalized pathway abundance tables
Statistical analysis results and visualizations (if –analysiscolumn is specified)
Output directory structure#
The output is organized in the following directory structure:
${launchDir}/results/metagenome/WMS_FUNCTION/
├── humann3/ # Individual HUMAnN3 results directories
│ ├── SRR6915091_humann3/ # Sample-specific HUMAnN3 results directory
│ │ ├── SRR6915091_genefamilies.tsv # Gene family abundances
│ │ ├── SRR6915091_pathabundance.tsv # Pathway abundances
│ │ ├── SRR6915091_pathcoverage.tsv # Pathway coverages
│ │ └── SRR6915091_humann_temp/ # Temporary processing files
│ └── ... # Results for other samples
├── humann3_combined/ # Combined HUMAnN3 results
│ └── humann_split_stratified_table/ # Split stratified and unstratified tables
└── WMS_Function_result/ # Statistical analysis results
├── alpha_diversity.csv # Alpha diversity metrics
├── beta_group_perMANOVA_stat.csv # PERMANOVA statistics
├── bray.csv # Bray-Curtis dissimilarity matrix
├── humann_alpha_beta_diversity.pdf # Alpha/beta diversity plots
├── humann_alpha_group_diff_stat.csv # Group difference statistics
├── humann_beta_group_distance_diff_stat.csv # Beta diversity statistics
├── humann_beta_group_distance_raw.csv # Raw distance values
├── humann_DAfunction_meta_correlation.pdf # Correlation plots
├── humann_group_barplot.pdf # Pathway abundance barplots
├── humann_lefse_heatmap.pdf # LEfSe heatmap visualization
├── humann_lefse_result.pdf # LEfSe differential pathway analysis
├── humann_metadata_autocorrelation.pdf # Metadata correlations
└── jaccard.csv # Jaccard distance matrix
├── humann_analysis_inspect.log # Analysis log file
└── Rplots.pdf # Additional R plots
Visualization Outputs Explained#
The WMS_FUNCTION module produces comprehensive visualizations for functional analysis as shown in this example figure comparing Control and CRC (colorectal cancer) groups:

① Pathway Abundance Stacked Barplots#
This visualization shows the relative abundance of metabolic pathways in each sample, grouped by condition:
Each vertical bar represents a sample
Different colors represent different metabolic pathways (listed in the legend)
The y-axis shows the abundance in CPM (Copies Per Million)
Samples are grouped by metadata categories (in this example, Control vs. CRC)
This visualization helps identify dominant functional capacities across sample groups and reveals potential shifts in metabolic capabilities between conditions. The humann_group_barplot.pdf file contains this visualization.
② Functional Diversity Analysis#
This panel displays diversity metrics calculated from functional profiles:
Top Section:
Alpha Diversity: Shannon diversity boxplots comparing functional diversity between groups
Beta Diversity: Bray-Curtis distance boxplots showing similarity within and between groups
Statistical significance is indicated (in this example, “ns” for not significant)
Bottom Section:
PCoA Plot: Principal Coordinates Analysis showing sample relationships based on functional profiles
Samples are colored by group with confidence ellipses
Statistical metrics are displayed (Pr(>F), Pseudo-F statistic, R²)
Axes show percent variance explained by each principal coordinate
This visualization is found in the humann_alpha_beta_diversity.pdf file, with corresponding statistics in the alpha_diversity.csv, bray.csv, and beta_group_perMANOVA_stat.csv files.
③ Differential Abundance Heatmap#
This visualization shows the results of LEfSe (Linear discriminant analysis Effect Size) analysis:
Each row represents a specific metabolic pathway
Each column represents a sample, grouped by condition
Color intensity indicates relative abundance (red=high, blue=low)
Pathways are selected based on statistical significance between groups
The heatmap allows identification of pathways that are consistently more abundant in one group compared to the other, providing potential functional biomarkers. This visualization is saved as humann_lefse_heatmap.pdf.
④ LEfSe Differential Abundance Boxplots#
This panel shows statistical comparison of specific pathways identified by LEfSe analysis:
Each row shows a differential abundant pathway
Bars represent abundance in different groups
The x-axis shows LDA score (Effect size)
Asterisks indicate statistical significance levels
Direction and length of bars indicate which group has higher abundance
This visualization highlights the most significantly different pathways between groups and is saved as humann_lefse_result.pdf.
⑤ Metadata Correlation Analysis#
This comprehensive panel analyzes relationships between metadata variables:
Top Row:
Distribution of categorical and numerical metadata variables
Middle Row:
Distribution of a numerical variable (e.g., host_age) by group
Correlation coefficients between variables, overall and by group
Bottom Row:
Distribution of another numerical variable (e.g., BMI) by group
Scatterplot showing relationship between numerical variables, colored by group
Density plots comparing distributions between groups
This analysis helps identify potential confounding variables or correlations between metadata that might influence functional profiles. This visualization is saved as humann_metadata_autocorrelation.pdf and humann_DAfunction_meta_correlation.pdf.
Together, these visualizations provide a comprehensive analysis of functional metagenomics data, from high-level pathway abundance patterns to specific differential pathways and their relationship with metadata variables.
Nextflow Processes in WMS_FUNCTION Module#
Process |
InputDir |
OutputDir |
Note |
|---|---|---|---|
humann3_run |
|
|
Performs HUMAnN3 analysis on individual samples |
humann3_parsing |
Output from humann3_run |
|
Combines and normalizes HUMAnN3 outputs |
function_analysis |
Output from humann3_parsing |
|
Performs statistical analysis and visualization |
Descriptions of Processes in WMS_FUNCTION Workflow#
humann3_run: Executes HUMAnN3 on each sample’s paired-end reads.
Input: Paired-end quality-filtered metagenomic reads
Output: Gene family and pathway profiles for each sample
Merges paired reads for processing
Uses UniRef90 protein database and MetaCyc pathway database
Performs both nucleotide and translated searches
humann3_parsing: Combines and normalizes the HUMAnN3 outputs from all samples.
Input: HUMAnN3 output files from all samples
Output: Combined and normalized pathway abundance tables
Joins tables across all samples
Normalizes to copies per million (CPM)
Separates stratified (organism-specific) and unstratified (community-level) tables
function_analysis: Performs statistical analysis and creates visualizations.
Input: Normalized pathway abundance tables and metadata
Output: Statistical results and visualizations
Identifies differentially abundant pathways between conditions
Creates heatmaps, PCA plots, and boxplots
Only runs if –analysiscolumn is specified
Tools Used in WMS_FUNCTION#
Tool |
Purpose |
Version |
Default parameters |
Parameters that can be selected |
|---|---|---|---|---|
HUMAnN3 |
Functional profiling |
3.0.0 |
|
|
MetaPhlAn |
Taxonomic profiling (used by HUMAnN3) |
4.0.6 |
|
None specific to this workflow |
R |
Statistical analysis and visualization |
4.3.2 |
N/A |
N/A |
Usage Notes#
The WMS_FUNCTION module is designed to work with the output from the RAWREAD_QC module.
Metadata file should be in CSV format with at least one column containing sample IDs that match the prefixes of your read filenames.
HUMAnN3 requires significant computational resources, especially for large datasets. Consider adjusting the CPU allocation with the
-pparameter.HUMAnN3 databases (ChocoPhlAn and UniRef90) must be properly installed and configured.
The statistical analysis component only runs when the
-a/--analysiscolumnparameter is specified with a valid column number.For optimal results, ensure your reads have been properly quality filtered and host DNA has been removed.
HUMAnN3 provides both stratified (organism-specific) and unstratified (community-level) functional profiles, allowing for detailed analysis of which organisms contribute to specific pathways.
Understanding CPM Normalization#
The Copies Per Million (CPM) normalization used in this module is crucial for accurate comparison of pathway abundances between samples:
What is CPM? CPM normalizes read counts by dividing each value by the total abundance in a sample, then multiplying by 1 million.
Why use CPM? Different samples often have different sequencing depths. Without normalization, samples with more sequencing would appear to have higher pathway abundances simply due to more reads. CPM allows fair comparison by accounting for these differences.
Implementation: The normalization is performed by the
humann_renorm_tableutility with the command:humann_renorm_table -i pathabund_join.tsv -o pathabund_join_renorm_cpm.tsv -u cpm
Processing Scripts#
The WMS_FUNCTION module uses several scripts to process and analyze data:
humann3_visualization.R: The main R script that performs:
Statistical analysis of pathway abundances
Creation of visualization plots
Differential abundance testing
Alpha and beta diversity calculations
Correlation analysis with metadata
Input parameters:
-i: Input file (normalized pathway abundance table)-out: Output directory name-m: Metadata file-mc: Metadata column for analysis-sc: Sample ID column in metadata
HUMAnN3 utility scripts:
humann_join_tables: Combines individual sample resultshumann_renorm_table: Performs CPM normalizationhumann_split_stratified_table: Separates organism-specific and community-level profiles
The execution of these scripts can be seen in the Nextflow processes defined in WMS_FUNCTION_apptainer.nf.
Next Steps#
After running WMS_FUNCTION, you can:
Explore taxonomic profiles with the WMS_TAXONOMY module to complement your functional analysis:
(metafun) metafun -module WMS_TAXONOMY -i results/metagenome/RAWREAD_QC/read_filtered -m metadata.csv -c 1 -a 2
Perform deeper analysis by:
Examining specific metabolic pathways of interest
Comparing functional profiles across different conditions
Correlating pathway abundances with metadata variables or taxonomic profiles
Identifying functional biomarkers for specific conditions
The WMS_FUNCTION module provides comprehensive insights into the metabolic potential of microbial communities, complementing taxonomic analysis to reveal not just who is present in your samples, but what they can do.