Question

Is this contig for real? Getting support from bowtie2 mappings

2

Entering edit mode

10.2 years ago

bitjunkie ▴ 40

Hey guys,

So, I have some contigs constructed from illumina paired-reads (with ABySS) that did not map to our reference genomic sequence, which was supposed to be the only thing in our sample. About half the reads did not map and we sequenced to a high depth. I want to find out which of these contigs are actually real.

My thought is to map the reads back to the contigs with bowtie2 and determine from the mapping data which are the most supported contigs. I already looked at how many reads mapped to each contig but I realized that didn't tell me enough information. I would like to determine support for a contig based on how many read pairs mapped concordantly and with the correct insert size. How can I do this procedurally? What should the formula look like for generating a quantitative measure of support?

Open to ideas other ideas, too.

Thanks!

sequencing alignment Assembly • 2.1k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by bitjunkie ▴ 40

1

Entering edit mode

Usually you can trust assemblers. They won't assemble contigs from nowhere. As Istvan said, searching against nt is a necessary step. A lot of sequences in nt are not put into the reference assembly. Nt also helps to identify microbiome contamination. If you are working on a model organism, also run repeatmasker. At least for humans, these extra contigs tend to be diverged copies of repeats.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by lh3 33k

Ram · Answer 1 · 2014-09-26

2

Entering edit mode

10.2 years ago

Istvan Albert 101k

Blast some reads/contigs against nt.

Though we once were in a very similar situation and even blasting against nt did not return any results whatsoever. We are still wondering where the heck have those reads come from.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Istvan Albert 101k

Ram · Answer 2 · 2014-09-26

0

Entering edit mode

10.2 years ago

Philipp Bayer 8.7k

I concur with blasting against nt -

Also, have a look at GC content using Blobology, if you see several distinct clusters then you might have contamination.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Philipp Bayer 8.7k