Is this contig for real? Getting support from bowtie2 mappings
2
2
Entering edit mode
10.2 years ago
bitjunkie ▴ 40

Hey guys,

So, I have some contigs constructed from illumina paired-reads (with ABySS) that did not map to our reference genomic sequence, which was supposed to be the only thing in our sample. About half the reads did not map and we sequenced to a high depth. I want to find out which of these contigs are actually real.

My thought is to map the reads back to the contigs with bowtie2 and determine from the mapping data which are the most supported contigs. I already looked at how many reads mapped to each contig but I realized that didn't tell me enough information. I would like to determine support for a contig based on how many read pairs mapped concordantly and with the correct insert size. How can I do this procedurally? What should the formula look like for generating a quantitative measure of support?

Open to ideas other ideas, too.

Thanks!

sequencing alignment Assembly • 2.1k views
ADD COMMENT
1
Entering edit mode

Usually you can trust assemblers. They won't assemble contigs from nowhere. As Istvan said, searching against nt is a necessary step. A lot of sequences in nt are not put into the reference assembly. Nt also helps to identify microbiome contamination. If you are working on a model organism, also run repeatmasker. At least for humans, these extra contigs tend to be diverged copies of repeats.

ADD REPLY
2
Entering edit mode
10.2 years ago

Blast some reads/contigs against nt.

Though we once were in a very similar situation and even blasting against nt did not return any results whatsoever. We are still wondering where the heck have those reads come from.

ADD COMMENT
0
Entering edit mode
10.2 years ago

I concur with blasting against nt -

Also, have a look at GC content using Blobology, if you see several distinct clusters then you might have contamination.

ADD COMMENT

Login before adding your answer.

Traffic: 1561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6