Hi! I'm new in BI. I just used blastn to compare two big fastas (more than 3.5M bp). I want to see their differences and where they are, because both are the sequencing of the same sample but with different approaches.
So I used this command on blastn: blastn -query assembly.fasta -subject renamed.fasta -outfmt 6 -out results.txt
And it worked and showed this output( i just show a few rows but there are more than 500).
contig_1 ctg.s1.F.arrow 99.999 3389370 6 17 1 3389361 3389351 1 0.0 6.259e+06
contig_1 ctg.s1.F.arrow 100.000 599564 0 1 3389362 3988924 3988920 3389357 0.0 1.107e+06
contig_1 ctg.s1.F.arrow 99.912 5672 2 3 992424 998093 3383686 3389356 0.0 10443
contig_1 ctg.s1.F.arrow 99.506 5669 8 10 1 5666 2391274 2396925 0.0 10296
contig_1 ctg.s1.F.arrow 99.904 5182 5 0 1783880 1789061 452377 447196 0.0 9542
contig_1 ctg.s1.F.arrow 99.904 5184 1 3 2936985 2942166 1605481 1600300 0.0 9542
contig_1 ctg.s1.F.arrow 100.000 3343 0 0 1150782 1154124 1531471 1534813 0.0 6174
contig_1 ctg.s1.F.arrow 100.000 3343 0 0 1854548 1857890 2235238 2238580 0.0 6174
contig_1 ctg.s1.F.arrow 100.000 2674 0 0 2933585 2936258 87318 89991 0.0 4939
If I am not wrong, each column means :
1. qseqid query (e.g., unknown gene) sequence id
2. sseqid subject (e.g., reference genome) sequence id
3. pident percentage of identical matches
4. length alignment length (sequence overlap)
5. mismatch number of mismatches
6. gapopen number of gap openings
7. qstart start of alignment in query
8. qend end of alignment in query
9. sstart start of alignment in subject
10. send end of alignment in subject
11. evalue expect value
12. bitscore bit score
But.. How do I interpret this? Where are the differences? How so rows needed? Is the BLAST comparing everytime a closer region to define where the differences are?
Thanks a lot if you help with this matter :)