I have a genome of a species, which I have annotated in house. The N50 for this genome is around 41,000.
I have another, better genome, of the same species, which I have also annotated in house. The N50 for this is 4,000,000.
I suspect my poorer genome has split and incomplete transcripts, which gets resolved with my better genome.
Does anyone have any tips on how I can find this split instances that are resolved with the better genome?
I have two protein fasta files of the same animal. One is an assembly with a small N50; the other pacbio one has a larger N50.
I wish to blastp the bad genome against the better one; after performing this I want to extract out the query length and hit length. I will then blastp the better one against the worse one.
So when I blast the bad genome against the pacbio one, I should get the query length being much smaller than the hit length. The opposite should be true when I blast the pacbio genome against the bad genome.
How can I extract out the query and hit lengths from the blast results?