Welcome to metaFun#
#
metaFun : An analysis pipeline for metagenomic big data with fast and unified Functional searches#
metaFun is implemented in Nextflow with apptainer. You can easily run this pipeline with easy installation using conda or mamba. This package is deposited in Bioconda channel (https://anaconda.org/bioconda/metafun)
Introduction#
metaFun is aimed at agile and scalable generation of metagenome assembled genomes and taxonomic profiling with statistical analysis. Using user interested genomes with metadata, this pipeline enables fast comprative genomic analysis and functional annotation.
Birdeye view of metaFun pipeline. This pipeline is comprised of seven analytical modules and four interactive modules.#
Quick Start#
Install Prerequisites (conda, miniconda, or mamba)
# Suppose you are using Linux OS. wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh # you can indicate the installation path by replacing -p $PATH. $PATH is the base directory of your conda installation. bash miniconda.sh -b -u -p ~/miniconda3 rm miniconda.sh
Install metaFun
# make metafun environment conda create -n metafun bioconda::metafun conda activate metafun
Download Databases
(metafun) metafun -module DOWNLOAD_DB # get help (metafun) metafun -help
FAQ & Troubleshooting#
Storage Management#
Reducing Disk Usage: After verifying your results, you can safely delete the Nextflow
work/directory to free up significant disk space:# Remove work directory after successful run rm -rf work/
Temporary Files: Several modules create large temporary files during processing that can be deleted after successful runs:
HUMAnN3 (
*_humann_tempdirectories in WMS_FUNCTION results)Assembly files (ASSEMBLY_BINNING intermediates)
MetaPhlAn bowtie2 indices (in WMS_TAXONOMY)
Selective Results Retention: For large metagenomic studies, consider keeping only essential outputs:
Save final tables and visualization files
Compress large text outputs with
gzipArchive raw binning results after successful bin refinement
Common Issues#
Metadata Formatting: The most common errors come from metadata file issues:
Ensure your sample IDs in the metadata CSV file exactly match the prefixes of your read filenames
Verify that the column numbers specified in parameters (
-s/-c,-a) are correctCheck that your CSV file uses comma (,) separators and not tabs or semicolons
For interactive modules, ensure consistent metadata across analysis stages
Database Installation: If you encounter database-related errors, you may need to reinstall the required databases:
# Reinstall specific databases (metafun) metafun -module DOWNLOAD_DB -d humann3 # For WMS_FUNCTION (metafun) metafun -module DOWNLOAD_DB -d kraken2 # For WMS_TAXONOMY # For complete reinstallation of all databases (metafun) metafun -module DOWNLOAD_DB
Memory Requirements: Several modules require significant memory:
ASSEMBLY_BINNING: Lower
-mparameter for metaSPAdes if OOM errors occurWMS_FUNCTION: Adjust threads with
-pparameter for HUMAnN3For low-memory environments, process samples in smaller batches
Module-Specific Troubleshooting#
RAWREAD_QC:
If human read filtering fails, check that the human genome database is properly installed
For samples with unusual quality distributions, adjust fastp parameters directly
ASSEMBLY_BINNING:
Low-coverage samples may produce fragmented assemblies; consider co-assembly of related samples
If binning produces too few bins, try adjusting the minimum contig length parameters
WMS_TAXONOMY:
If Kraken2 results show high “unclassified” percentages, try updating to the latest database
When switching between profilers (sylph/kraken2), remember to use the correct phyloseq object
WMS_FUNCTION:
For pathway analysis issues, check that both ChocoPhlAn and UniRef90 databases are installed
Stratified outputs may be large; use unstratified tables for overview analyses
Interactive Modules:
If web interfaces fail to load, check for port conflicts and use the
-pparameterFor visualization export issues, verify that required R packages are properly installed
Performance Optimization#
Parallelize Efficiently:
Adjust CPU allocation based on available resources and module needs:
# Example optimized parameters for high-performance systems (metafun) metafun -module ASSEMBLY_BINNING -p 24 -m 128 (metafun) metafun -module WMS_FUNCTION -p 16
Staged Analysis:
For large datasets, run modules sequentially on subsets of samples
Process taxonomic analysis (fast) before resource-intensive assembly or functional analysis
Resume Functionality:
Utilize Nextflow’s resume feature to continue interrupted workflows:
# Example restarting a failed run (metafun) metafun -module ASSEMBLY_BINNING -resume
Getting Support#
GitHub Issues: For bug reports, feature requests, or support:
Visit the metaFun GitHub repository
Create a new issue describing your problem or request
Include details about your environment, command used, and error messages
Documentation: Refer to the specific module documentation for detailed parameter descriptions and usage examples
Citing metaFun: If you use metaFun in your research, please cite:
[Citation information to be added upon publication]
Getting Started
- 1. Quick start
- 1.1. metaFun install and run
- 1.2. RAWREAD_QC: Quality control of raw reads and host genome filtering
- 1.3. ASSEMBLY_BINNING: Assembly and binning
- 1.4. BIN_ASSESSMENT: Assess genome quality and taxonomy classification
- 1.5. GENOME_SELECTOR: Genome selection interface
- 1.6. COMPARATIVE_ANNOTATION: Comparative genomic analysis
- 1.7. INTERACTIVE_COMPARATIVE: Interactive comparative analysis
- 1.8. WMS_TAXONOMY: Taxonomic profiling of metagenomic reads
- 1.9. INTERACTIVE_TAXONOMY: Interactive taxonomy analysis
- 1.10. WMS_FUNCTION: Functional analysis of metagenomic reads
- 1.11. WMS_STRAIN: Strain-level microdiversity analysis
- 1.12. INTERACTIVE_STRAIN: Interactive strain diversity analysis
- 1.13. INTERACTIVE_NETWORK: Interactive microbial network analysis
- 1.14. DOWNLOAD_DB: Download required databases
- 1.15. PREPARE_CUSTOM_HOST: Prepare custom host genome index
- 1.16. Common Options
- 2. Beginners Guide to metaFun
metaFun workflows
- Workflows description
- this is RAWREAD_QC description page.
- RAWREAD_QC
- ASSEMBLY_BINNING
- BIN_ASSESSMENT
- Overview
- Module Execution
- Module Operation Sequence
- Parameters
- Inputs and Outputs
- Execution Examples and Results
- Nextflow Processes in BIN_ASSESSMENT Module
- Descriptions of Processes in BIN_ASSESSMENT Workflow
- Tools Used in BIN_ASSESSMENT
- Custom Scripts in BIN_ASSESSMENT
- Usage Notes
- Next Steps
- Combining Metadata with Quality/Taxonomy Results
- GENOME_SELECTOR
- COMPARATIVE_ANNOTATION
- INTERACTIVE_COMPARATIVE
- WMS_TAXONOMY
- INTERACTIVE_TAXONOMY
- WMS_FUNCTION
- WMS_STRAIN
- INTERACTIVE_STRAIN
- INTERACTIVE_NETWORK