subsetting sets of scaffolds based on sequence similarity
0
0
Entering edit mode
9.6 years ago
second_exon ▴ 210

Hi,

Couple of months back, I sequenced (MiSeq) few BACs and assembled (Paired-end reads) using SOAPdenovo, but my assembly was fallen into many scaffolds. Now, I got reference genome of the same cultivar and trying to pull out my interested region (about 3.2 Mb).

Here is first approach:

I mapped my Paired-end reads on whole genome using BWA. By this approach, only 19 scaffolds of whole genome got mapped.

Second approach:

I blasted (blastn) SOAPdenovo assembly with whole genome (evalue: 1000, word size 40, percentage similarity: 100%). In this approach, more than 1500 scaffolds of whole genome got blast hits.

My question is, why this variation? Any problem with my mapping? Which is the best approach?

Or any other approach? Please share your experience guys!

EDIT: I am also thinking about reference guided re-assembly of my Paired-ends.

Thanks
Ramesh

sam NGS bam Assembly • 1.6k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Not exactly. Here, my question is how to subset sequences? Not to close gaps?

I hope, you agree with me

ADD REPLY
0
Entering edit mode

If I were you, I would try out a few different assemblers. For example IDBA-Hybrid sounds like the perfect match for your problem. Before making it you might want to edit src/sequence/short_sequence.h for longer read length.

ADD REPLY
0
Entering edit mode

Thank you! I will try this out.

ADD REPLY

Login before adding your answer.

Traffic: 2374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6