Welcome to metaFun#

#

metaFun : An analysis pipeline for metagenomic big data with fast and unified Functional searches#

metaFun is implemented in Nextflow with apptainer. You can easily run this pipeline with easy installation using conda or mamba. This package is deposited in Bioconda channel (https://anaconda.org/bioconda/metafun)

Introduction#

metaFun is aimed at agile and scalable generation of metagenome assembled genomes and taxonomic profiling with statistical analysis. Using user interested genomes with metadata, this pipeline enables fast comprative genomic analysis and functional annotation.

metafun_pipeline

Birdeye view of metaFun pipeline. This pipeline is comprised of seven analytical modules and four interactive modules.#

Quick Start#

  1. Install Prerequisites (conda, miniconda, or mamba)

     # Suppose you are using Linux OS.
     wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
     # you can indicate the installation path by replacing -p $PATH. $PATH is the base directory of your conda installation.
     bash miniconda.sh -b -u -p ~/miniconda3
     rm  miniconda.sh
    
  2. Install metaFun

    # make  metafun environment
    
    conda create -n metafun bioconda::metafun
    conda activate metafun
    
  3. Download Databases

    (metafun)  metafun  -module DOWNLOAD_DB 
    # get help 
    (metafun)  metafun  -help
    

FAQ & Troubleshooting#

Storage Management#

  • Reducing Disk Usage: After verifying your results, you can safely delete the Nextflow work/ directory to free up significant disk space:

    # Remove work directory after successful run
    rm -rf work/
    
  • Temporary Files: Several modules create large temporary files during processing that can be deleted after successful runs:

    • HUMAnN3 (*_humann_temp directories in WMS_FUNCTION results)

    • Assembly files (ASSEMBLY_BINNING intermediates)

    • MetaPhlAn bowtie2 indices (in WMS_TAXONOMY)

  • Selective Results Retention: For large metagenomic studies, consider keeping only essential outputs:

    • Save final tables and visualization files

    • Compress large text outputs with gzip

    • Archive raw binning results after successful bin refinement

Common Issues#

  • Metadata Formatting: The most common errors come from metadata file issues:

    • Ensure your sample IDs in the metadata CSV file exactly match the prefixes of your read filenames

    • Verify that the column numbers specified in parameters (-s/-c, -a) are correct

    • Check that your CSV file uses comma (,) separators and not tabs or semicolons

    • For interactive modules, ensure consistent metadata across analysis stages

  • Database Installation: If you encounter database-related errors, you may need to reinstall the required databases:

    # Reinstall specific databases
    (metafun) metafun -module DOWNLOAD_DB -d humann3     # For WMS_FUNCTION
    (metafun) metafun -module DOWNLOAD_DB -d kraken2     # For WMS_TAXONOMY
    
    
    # For complete reinstallation of all databases
    (metafun) metafun -module DOWNLOAD_DB
    
  • Memory Requirements: Several modules require significant memory:

    • ASSEMBLY_BINNING: Lower -m parameter for metaSPAdes if OOM errors occur

    • WMS_FUNCTION: Adjust threads with -p parameter for HUMAnN3

    • For low-memory environments, process samples in smaller batches

Module-Specific Troubleshooting#

  • RAWREAD_QC:

    • If human read filtering fails, check that the human genome database is properly installed

    • For samples with unusual quality distributions, adjust fastp parameters directly

  • ASSEMBLY_BINNING:

    • Low-coverage samples may produce fragmented assemblies; consider co-assembly of related samples

    • If binning produces too few bins, try adjusting the minimum contig length parameters

  • WMS_TAXONOMY:

    • If Kraken2 results show high “unclassified” percentages, try updating to the latest database

    • When switching between profilers (sylph/kraken2), remember to use the correct phyloseq object

  • WMS_FUNCTION:

    • For pathway analysis issues, check that both ChocoPhlAn and UniRef90 databases are installed

    • Stratified outputs may be large; use unstratified tables for overview analyses

  • Interactive Modules:

    • If web interfaces fail to load, check for port conflicts and use the -p parameter

    • For visualization export issues, verify that required R packages are properly installed

Performance Optimization#

  • Parallelize Efficiently:

    • Adjust CPU allocation based on available resources and module needs:

      # Example optimized parameters for high-performance systems
      (metafun) metafun -module ASSEMBLY_BINNING -p 24 -m 128
      (metafun) metafun -module WMS_FUNCTION -p 16
      
  • Staged Analysis:

    • For large datasets, run modules sequentially on subsets of samples

    • Process taxonomic analysis (fast) before resource-intensive assembly or functional analysis

  • Resume Functionality:

    • Utilize Nextflow’s resume feature to continue interrupted workflows:

      # Example restarting a failed run
      (metafun) metafun -module ASSEMBLY_BINNING -resume
      

Getting Support#

  • GitHub Issues: For bug reports, feature requests, or support:

    • Visit the metaFun GitHub repository

    • Create a new issue describing your problem or request

    • Include details about your environment, command used, and error messages

  • Documentation: Refer to the specific module documentation for detailed parameter descriptions and usage examples

  • Citing metaFun: If you use metaFun in your research, please cite:

    [Citation information to be added upon publication]
    

metaFun workflows