Question

Very Few Reads Mapping Back To Contigs - Plant Transcriptome

2

Entering edit mode

12.8 years ago

Cerebralrust ▴ 20

I assembled plant transcriptome 454 data (non normalised) using trinity after the following

1)pre processing (removal of adaptors, vector contamination) 2)removal of rRna sequences 3)removal of chloroplast and mitochondrial genes using bwa

From 3,70,929 reads, i got 21,486 contigs. When i mapped the reads to the contigs using bwa, only 44,678 reads were used in the assembly. What am i doing wrong here? I randomly blasted the contigs to observe that they share over 90% similarity with related legume proteins (although many were hypothetical). However, only a small percentage of the contigs align to the transcript assemblies of related legumes when mapped using bwa.

The velvet assembly of the same data resulted in 15,323 contigs with lesser n50 value, n90 value, max length etc. MIRA assembly resulted in more contigs and more reads being used but lesser n50, n90 and avg length of contig. Why are only 44,678 reads being used? Any advice is greatly appreciated.

plant rna read mapping bwa • 3.8k views

ADD COMMENT • link updated 12.8 years ago by Cerebralrust • 0 • written 12.8 years ago by Cerebralrust ▴ 20

0

Entering edit mode

Do you mean 370k reads or 3 million? That would have a big impact on interpreting your read usage. Also, I agree with (22308)3 that Newbler would be a good tool of choice for your data.

ADD REPLY • link 12.7 years ago by SES 8.6k

Ram · Answer 1 · 2012-02-23

According to one of key developers of Trinity - Brian J. Haas' option:

"Ultimately, Trinity might not be the best tool for assembling 454 data, since coverage won't be anywhere near what is expected from Illumina in most cases, and Trinity exploits the high coverage data as part of reconstructing transcripts. The current version of Newbler is supposed to work especially well for 454 transcriptome data, so I encourage you to give that a try if you haven't already."

score 0 · Answer 2 · 2012-03-09

0

Entering edit mode

12.7 years ago

2184687-1231-83- ★ 5.1k

I would try Newbler 2.6 if you have access to it. Use bwasw to map 454 reads to contigs.

ADD COMMENT • link 12.7 years ago by 2184687-1231-83- ★ 5.1k

score 0 · Answer 3 · 2012-03-11

I did try Newbler. However, Newbler generated only 9494 isotigs out of 2,50,000 reads. Although, the N50 value, size of contigs and other metrics are quite positive. I am going to BLASTx the entire set of contigs from the three assemblers to the proteomes of related species and the NR databases; allowing the results to determine the best assembly. Any other strategy is hugely appreciated.