Question

About the reads aligned 0 times (around 10%) in Bowtie2.

0

Entering edit mode

7.4 years ago

ghostforever.shi ▴ 50

Hi, everyone. I have some questions about the reads aligned 0 times in Bowtie2 (About 10% in my experiment). What are these reads really are? And I wonder why there will be several reads cannot be aligned. Did these reads have effect on the peak‘s height we get in the following analysis? Furthermore, I learn that snp-calling will also use Bowtie2 sometimes. Will these unaligned reads affect the snp-calling? Maybe result in some lost on the information? Really appreciate for the answer.

ChIP-Seq SNP • 2.3k views

ADD COMMENT • link updated 7.4 years ago by Philipp Bayer 8.7k • written 7.4 years ago by ghostforever.shi ▴ 50

score 4 · Accepted Answer · 2017-07-12

4

Entering edit mode

7.4 years ago

Philipp Bayer 8.7k

These unaligning reads could be:

low quality reads (perhaps even containing Ns)
lab contamination from some other species (bacteria/fungi living on your species, human preparing your samples, sequencing machine wasn't cleaned properly and you get stuff from previous runs, that's what I've seen) paper
contamination from the sample prep (reagent), paper
stuff not present in the reference genome

If you're bored you can run metagenomics software like Kraken or MEGAN to see where your unaligned reads come from.

What do you mean by peak's height in the following analysis? what analysis, what peak?

These reads shouldn't have an effect on SNP calling as the software just analyses the alignments. The unaligning SNPs could harbour some SNPs if these reads are from your species, so you could assemble your unaligned reads and see whether you can find SNPs, but is the extra work worth it?

ADD COMMENT • link 7.4 years ago by Philipp Bayer 8.7k

2

Entering edit mode

To figure out where unexpected reads come from this approach might be interesting: Read Origin Protocol. The paper has an amazing title: Dumpster diving in RNA-sequencing to find the source of every last read, althought to my disappointment the more recent version of that preprint has a different title.

ADD REPLY • link 7.4 years ago by WouterDeCoster 47k

0

Entering edit mode

A great answer. Much appreciate. I am doing ChIP-seq analysis these days. And those unaligned reads do bother me since I am not quite understand why they can't be aligned and whether these unaligned reads may contain some information, such as the peak's location or height.

ADD REPLY • link 7.4 years ago by ghostforever.shi ▴ 50

0

Entering edit mode

In WGS, you always get 5-15% unaligning reads, so I usually don't bother too much there. Not sure about the numbers for ChIP-seq...

ADD REPLY • link 7.4 years ago by Philipp Bayer 8.7k