Assembling a single transcript sequence from RNA-seq data
0
0
Entering edit mode
9.0 years ago

I am looking for suggestions on how to assemble a single, fairly complex transcript sequence from RNA-Seq data. The protein this transcript encodes has a variable number of repeated 10 a.a. domains. Assembling with Trinity or SOAPdenove-trans did not generate a complete sequence for the proteins -- the protein does not other domains found in known orthologs.

I also tried aligning reads against orthologs (used usearch) and I assembled those reads that aligned using CAP3 and Velvet. That approach did actually worse than Trinity.

Any suggestion on how to accurately assemble that single sequence?

Thanks

RNA-Seq Assembly • 2.1k views
ADD COMMENT
0
Entering edit mode

You might try Spades or Tadpole (in the BBMap package); both of them handle highly-variable coverage better than most isolate assemblers. However, if it is differentially spliced, that result will still probably not be great.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, Brian. Once I pulled out reads that aligned against the homologue sequence (as well as their non-aligned pairs), I experimented with a plethora of assemblers, including SPAdes. None of them yielded a complete transcript.

I wonder whether there are tools that use a greedy approach to iteratively add reads to both ends of a partial transcript. Perhaps that would be intractable for a large datasets but maybe it is a viable solution for single, or even a few transcripts. Would such an approach work or would it be overwhelmed by the complexity of assembly? I wonder....

ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6