Question

Potential Bacteria Contamination - RNA-Seq

0

Entering edit mode

4.3 years ago

VHahaut ★ 1.2k

Hi!

I have received a series of human poly-A RNA-seq samples (single-end 75 bp) which display suspicious mapping values. These samples have been mapped with STAR and show +/- 30-50% of reads "unmapped: reads too short". Previous samples done with the same method had only between 5 and 10%.

Despite the sharp drops of uniquely mapping reads the sequencing worked well (many genes detected, mapping to exons, splicing visible, ...).

After careful inspection of the reads I start to suspect a bacterial contamination as:

Many of the blasted reads are a perfect match with E. Coli or other prokaryotes.
These are not ribosomal reads (evaluated with BBDuk).
They do not appear to contain the primers / adapter sequences used in the library preparation.
If I map these reads to a hybrid E. Coli 16S - h38 genome I get 10-100 times more reads mapping to this E. Coli genome in these new samples than in the old ones.

I would like to evaluate the proportion of reads coming from prokaryotes (E. Coli?) in these samples. As I am not familiar with the metagenomics field, I was wondering if someone could recommend a procedure to do so.

I am also open to other suggestions regarding the possible issues with these samples.

Thank you in advance!

RNA-Seq metagenomics • 2.1k views

ADD COMMENT • link updated 4.1 years ago by Thind amarinder ▴ 340 • written 4.3 years ago by VHahaut ★ 1.2k

1

Entering edit mode

try with fastqscreen. Index the E. coli genome, edit the configuration file. Fastqscreen prints our the contamination levels. Please increase the numbers of reads to be analyzed.

ADD REPLY • link 4.3 years ago by cpad0112 21k

score 1 · Answer 1 · 2020-07-31

Use bbsplit.sh from BBMap suite. It is meant to be used when you need to align data to multiple genomes (and bin reads) at the same time. See this page. You can decide what to do with reads that multi-map within and across genomes via ambiguous= and ambiguous2= options. Include refstats= option to get detailed stats.

Samples could have been contaminated depending on how/where they were collected and processed (i.e contamination present in sample/introduced in later steps). If the contamination levels are more or less similar you could still do the analysis as you already discovered. Especially if the samples are not replaceable easily.

score 1 · Answer 2 · 2020-11-11

1

Entering edit mode

4.1 years ago

Thind amarinder ▴ 340

Decotaminer tool investigates the origin of unmapped reads and assigns taxon.

DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2684-x

https://github.com/amarinderthind/decontaminer

ADD COMMENT • link 4.1 years ago by Thind amarinder ▴ 340