Hi everyone,
I am analyzing antibiotic resistance genes (ARGs) in marine samples using metagenomic sequencing. I processed around 60 samples with ARGs-OAP and found that beta-lactam resistance genes (e.g., TEM-117) dominate my dataset, accounting for more than 95% of the total ARG abundance.
To further investigate, I annotated ARGs on my assembled Illumina and Nanopore contigs. Interestingly, the contigs carrying TEM-117 are quite long (~10 kbp). To determine the microbial hosts, I performed BLASTn searches against the NCBI database. The results indicate that the contigs can be separated into two distinct regions:
- A ~3 kbp segment matching a cloning vector
- A ~7 kbp segment aligning with the partial genome of AcMNPV (Autographa californica multiple nucleopolyhedrovirus), an insect-infecting virus
Since AcMNPV is not expected in a marine environment, I suspect this may be contamination rather than a naturally occurring sequence.
My Questions:
- Is this likely contamination? Has anyone encountered similar issues in marine metagenomic studies?
How can I effectively filter out these contaminant reads from my dataset? I attempted using Bowtie2 to screen out AcMNPV-related sequences based on my assembly contig (see command below), but some still remain when I re-run ARGs-OAP:
bowtie2 -x /data/Juihung/AcMNPV/KT_AcMNPV.index -1 /data/Juihung/20240905_data/level_1_Kenting_Inlet_R1.fastq.gz \ -2 /data/Juihung/20240905_data/level_1_Kenting_Inlet_R2.fastq.gz -S /data/Juihung/screen_cloning/KT.sam \ --un-conc /data/Juihung/screen_cloning/screen_Kenting_Inlet.fastq
Are there better approaches or tools to screen out these unexpected sequences while minimizing loss of true ARG-related reads?
Any insights or suggestions would be greatly appreciated!
Thanks in advance!