Hi all,
I did a reverse BLAST where my trancriptome is the database and my scaffold is the query. The reason why I do this is to see where the transcripts fall in the genome and if there are a lot of intron-like gaps.
When I look at the output, I found that some transcripts appear in more than one scaffold. What could be the reasons?
cat (my_blast_output).txt | grep -A 2 "Sequences producing significant alignments" | grep comp | awk '{print $1} ' | sort | uniq -c | sort -r
5 comp65253_c1_seq4
5 comp48571_c0_seq4
4 comp65721_c3_seq3
4 comp63218_c0_seq2
3 comp65722_c0_seq1
3 comp64106_c2_seq22
3 comp54658_c0_seq1
3 comp45777_c0_seq2
3 comp23829_c0_seq1
2 comp85529_c0_seq1
2 comp63346_c0_seq6
2 comp57872_c0_seq1
2 comp25489_c0_seq1
2 comp100860_c0_seq1
1 comp66091_c0_seq10
1 comp65186_c0_seq6
Also, not the entire transcript was represented. What could be the reasons?
If you want to see the blast output, here it is: https://www.dropbox.com/s/xv5fnb54kwqi6ii/reverse_blastout_outfmt4_121317_simplified.txt?dl=0
I am fairly new in data analysis. That's why I need more guidance/help at this stage. What other conclusions can I make from this blast output?
In addition, the transcriptome assembly is probably fragmented as well, at least to some level.