Hi all,
First of all, please accept my apologize if you find the question is basic for you, bioinformatics experts. But, it's a challenge for me as a just biologist student, so please be patient. I would like to compare then combine two fasta assembly files generated by two assemblers; to this end, I did blastn with threshold of e-value of 1E-100 and identity of 98%. Assuming that A is contig ID of assembly 1 and B is contig ID of assembly 2, D is alignment length, M is query sequence(assembly 1) length, N is subject sequence (assembly 2) length. I want to if N < (M+200), keep A (and replace it with counterparts in the fasta file generated by assembly 2), if D=N and (M+200) <N, discard A and keep B. Could you please help me out on this issue? Thanks so much in advance.
A B C D E F G H I J K L M N
query Id subject Id Identity length mismatich gapopening query start query end subject start subject end e-value bitscore qlen slen
contig10002|m.12543 c26528_g1_i1|m.14066 100 762 0 0 28 789 1 762 0 1408 789 762
contig10003|m.12544 c39648_g1_i1|m.25685 100 945 0 0 1 945 1 945 0 1746 945 945
contig10003|m.12545 c39648_g1_i1|m.25685 100 336 0 0 1 336 780 445 2.00E-177 621 336 945
contig10004|m.12546 c54250_g1_i3|m.62628 100 462 0 0 1 462 1 462 0 854 462 468
contig10005|m.12547 c54760_g1_i3|m.64975 100 564 0 0 1 564 1 564 0 1042 564 564
contig10006|m.12548 c64049_g2_i2|m.128345 100 526 0 0 188 713 236 761 0 972 729 1089
Instead of writing your own script, you could also try using GAM-NGS: http://www.ncbi.nlm.nih.gov/pubmed/23815503
It aligns reads to two similar genome assemblies and merges the two assemblies based on how well the reads align.
Thanks, but my issue is transcriptome assembly not genome assembly. Does it work fine for transcriptome assembly?