Question

Potential Contamination in ARG Metagenomic Analysis – How to Filter Out Reads?

1

Entering edit mode

4 months ago

JH ▴ 10

Hi everyone,

I am analyzing antibiotic resistance genes (ARGs) in marine samples using metagenomic sequencing. I processed around 60 samples with ARGs-OAP and found that beta-lactam resistance genes (e.g., TEM-117) dominate my dataset, accounting for more than 95% of the total ARG abundance.

To further investigate, I annotated ARGs on my assembled Illumina and Nanopore contigs. Interestingly, the contigs carrying TEM-117 are quite long (~10 kbp). To determine the microbial hosts, I performed BLASTn searches against the NCBI database. The results indicate that the contigs can be separated into two distinct regions:

A ~3 kbp segment matching a cloning vector
A ~7 kbp segment aligning with the partial genome of AcMNPV (Autographa californica multiple nucleopolyhedrovirus), an insect-infecting virus

Since AcMNPV is not expected in a marine environment, I suspect this may be contamination rather than a naturally occurring sequence.

My Questions:

Is this likely contamination? Has anyone encountered similar issues in marine metagenomic studies?

How can I effectively filter out these contaminant reads from my dataset? I attempted using Bowtie2 to screen out AcMNPV-related sequences based on my assembly contig (see command below), but some still remain when I re-run ARGs-OAP:

bowtie2 -x /data/Juihung/AcMNPV/KT_AcMNPV.index -1 /data/Juihung/20240905_data/level_1_Kenting_Inlet_R1.fastq.gz \
        -2 /data/Juihung/20240905_data/level_1_Kenting_Inlet_R2.fastq.gz -S /data/Juihung/screen_cloning/KT.sam \
        --un-conc /data/Juihung/screen_cloning/screen_Kenting_Inlet.fastq

Are there better approaches or tools to screen out these unexpected sequences while minimizing loss of true ARG-related reads?

Any insights or suggestions would be greatly appreciated!

Thanks in advance!

metagenomics ARG • 279 views

ADD COMMENT • link 4 months ago by JH ▴ 10