Question

Combination of two assemblies and get desired contigs

0

Entering edit mode

9.4 years ago

seta ★ 1.9k

Hi all,

First of all, please accept my apologize if you find the question is basic for you, bioinformatics experts. But, it's a challenge for me as a just biologist student, so please be patient. I would like to compare then combine two fasta assembly files generated by two assemblers; to this end, I did blastn with threshold of e-value of 1E-100 and identity of 98%. Assuming that A is contig ID of assembly 1 and B is contig ID of assembly 2, D is alignment length, M is query sequence(assembly 1) length, N is subject sequence (assembly 2) length. I want to if N < (M+200), keep A (and replace it with counterparts in the fasta file generated by assembly 2), if D=N and (M+200) <N, discard A and keep B. Could you please help me out on this issue? Thanks so much in advance.

A                     B                       C          D        E           F            G             H           I               J             K           L          M      N
query Id              subject Id              Identity   length   mismatich   gapopening   query start   query end   subject start   subject end   e-value     bitscore   qlen   slen
contig10002|m.12543   c26528_g1_i1|m.14066    100        762      0           0            28            789         1               762           0           1408       789    762
contig10003|m.12544   c39648_g1_i1|m.25685    100        945      0           0            1             945         1               945           0           1746       945    945
contig10003|m.12545   c39648_g1_i1|m.25685    100        336      0           0            1             336         780             445           2.00E-177   621        336    945
contig10004|m.12546   c54250_g1_i3|m.62628    100        462      0           0            1             462         1               462           0           854        462    468
contig10005|m.12547   c54760_g1_i3|m.64975    100        564      0           0            1             564         1               564           0           1042       564    564
contig10006|m.12548   c64049_g2_i2|m.128345   100        526      0           0            188           713         236             761           0           972        729    1089

alignment Assembly sequencing RNA-Seq • 1.9k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 9.4 years ago by seta ★ 1.9k

0

Entering edit mode

Instead of writing your own script, you could also try using GAM-NGS: http://www.ncbi.nlm.nih.gov/pubmed/23815503

It aligns reads to two similar genome assemblies and merges the two assemblies based on how well the reads align.

ADD REPLY • link 9.4 years ago by Philipp Bayer 8.8k

0

Entering edit mode

Thanks, but my issue is transcriptome assembly not genome assembly. Does it work fine for transcriptome assembly?

ADD REPLY • link 9.4 years ago by seta ★ 1.9k

Ram · Answer 1 · 2015-07-01

0

Entering edit mode

9.4 years ago

Sej Modha 5.3k

You can try using Transrate for transcriptome assembly comparison.

http://hibberdlab.com/transrate/metrics.html

ADD COMMENT • link 9.4 years ago by Sej Modha 5.3k

0

Entering edit mode

My main goal is combination of two assemblies rather than comparison

ADD REPLY • link updated 24 months ago by Ram 44k • written 9.4 years ago by seta ★ 1.9k