Welcome to metaFun

Welcome to metaFun#

#

metaFun : An analysis pipeline for metagenomic big data with fast and unified Functional searches#

metaFun is implemented in Nextflow with apptainer. You can easily run this pipeline with easy installation using conda or mamba. This package is deposited in Bioconda channel (https://anaconda.org/bioconda/metafun)

Introduction#

metaFun is aimed at agile and scalable generation of metagenome assembled genomes and taxonomic profiling with statistical analysis. Using user interested genomes with metadata, this pipeline enables fast comprative genomic analysis and functional annotation.

metafun_pipeline — Birdeye view of metaFun pipeline. This pipeline is comprised of seven analytical modules and four interactive modules.#

Quick Start#

Install Prerequisites (conda, miniconda, or mamba)

 # Suppose you are using Linux OS.
 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
 # you can indicate the installation path by replacing -p $PATH. $PATH is the base directory of your conda installation.
 bash miniconda.sh -b -u -p ~/miniconda3
 rm  miniconda.sh

Install metaFun

# make  metafun environment

conda create -n metafun bioconda::metafun
conda activate metafun

Download Databases

(metafun)  metafun  -module DOWNLOAD_DB 
# get help 
(metafun)  metafun  -help

RAWREAD_QC

ASSEMBLY_BINNING

BIN_ASSESSMENT

GENOME_SELECTOR

COMPARATIVE_ANNOTATION

INTERACTIVE_COMPARATIVE

WMS_TAXONOMY

INTERACTIVE_TAXONOMY

WMS_FUNCTION

WMS_STRAIN

INTERACTIVE_STRAIN

INTERACTIVE_NETWORK

FAQ & Troubleshooting#

Storage Management#

Reducing Disk Usage: After verifying your results, you can safely delete the Nextflow work/ directory to free up significant disk space:
```
# Remove work directory after successful run
rm -rf work/
```
Temporary Files: Several modules create large temporary files during processing that can be deleted after successful runs:
- HUMAnN3 (*_humann_temp directories in WMS_FUNCTION results)
- Assembly files (ASSEMBLY_BINNING intermediates)
- MetaPhlAn bowtie2 indices (in WMS_TAXONOMY)
Selective Results Retention: For large metagenomic studies, consider keeping only essential outputs:
- Save final tables and visualization files
- Compress large text outputs with gzip
- Archive raw binning results after successful bin refinement

Common Issues#

Metadata Formatting: The most common errors come from metadata file issues:
- Ensure your sample IDs in the metadata CSV file exactly match the prefixes of your read filenames
- Verify that the column numbers specified in parameters (-s/-c, -a) are correct
- Check that your CSV file uses comma (,) separators and not tabs or semicolons
- For interactive modules, ensure consistent metadata across analysis stages

Database Installation: If you encounter database-related errors, you may need to reinstall the required databases:

# Reinstall specific databases
(metafun) metafun -module DOWNLOAD_DB -d humann3     # For WMS_FUNCTION
(metafun) metafun -module DOWNLOAD_DB -d kraken2     # For WMS_TAXONOMY


# For complete reinstallation of all databases
(metafun) metafun -module DOWNLOAD_DB

Memory Requirements: Several modules require significant memory:
- ASSEMBLY_BINNING: Lower -m parameter for metaSPAdes if OOM errors occur
- WMS_FUNCTION: Adjust threads with -p parameter for HUMAnN3
- For low-memory environments, process samples in smaller batches

Module-Specific Troubleshooting#

RAWREAD_QC:
- If human read filtering fails, check that the human genome database is properly installed
- For samples with unusual quality distributions, adjust fastp parameters directly
ASSEMBLY_BINNING:
- Low-coverage samples may produce fragmented assemblies; consider co-assembly of related samples
- If binning produces too few bins, try adjusting the minimum contig length parameters
WMS_TAXONOMY:
- If Kraken2 results show high “unclassified” percentages, try updating to the latest database
- When switching between profilers (sylph/kraken2), remember to use the correct phyloseq object
WMS_FUNCTION:
- For pathway analysis issues, check that both ChocoPhlAn and UniRef90 databases are installed
- Stratified outputs may be large; use unstratified tables for overview analyses
Interactive Modules:
- If web interfaces fail to load, check for port conflicts and use the -p parameter
- For visualization export issues, verify that required R packages are properly installed

Performance Optimization#

Parallelize Efficiently:

Adjust CPU allocation based on available resources and module needs:

# Example optimized parameters for high-performance systems
(metafun) metafun -module ASSEMBLY_BINNING -p 24 -m 128
(metafun) metafun -module WMS_FUNCTION -p 16

Staged Analysis:
- For large datasets, run modules sequentially on subsets of samples
- Process taxonomic analysis (fast) before resource-intensive assembly or functional analysis

Resume Functionality:

Utilize Nextflow’s resume feature to continue interrupted workflows:

# Example restarting a failed run
(metafun) metafun -module ASSEMBLY_BINNING -resume

Getting Support#

GitHub Issues: For bug reports, feature requests, or support:
- Visit the metaFun GitHub repository
- Create a new issue describing your problem or request
- Include details about your environment, command used, and error messages
Documentation: Refer to the specific module documentation for detailed parameter descriptions and usage examples
Citing metaFun: If you use metaFun in your research, please cite:
```
[Citation information to be added upon publication]
```

Getting Started

metaFun workflows

metaFun in KBDS