Getting coordinates of unmapped regions and regions that map to them
1
0
Entering edit mode
2.3 years ago
ja569116 • 0

Hi, I assembled a genome using long and short-reads and then used GATK for genotyping. Once, I read that one paper assessed genome quality by remapping the reads to the reference, and when I checked the coverage, it was less than 99%, missing like 1.3% which doesn't seem a lot, but in terms of length it is.

I wanted to do two things. First, I wanted to get the coordinates of the unmapped regions and then get only the unmapped regions of the fasta reference, and then check again the BUSCO SCORE. In a similar context, I have the annotation gff3 file and I want to find how many genes are located in the unmapped regions.

Are there any programs that can help me do these tasks?

Thank you very much;

reference genome unmapped regions • 571 views
ADD COMMENT
0
Entering edit mode
2.3 years ago

I would move on to some more important aspects rather than trying to salvage 1% of the data.

There are many reasons of why you would miss some regions, from hard to sequence to hard to assemble, very repetitive, ultra low complexity, duplications, copy number variation, pseudo genes etc - every genome has a lot of junk in it

It is only worth the effort if you expect that those regions are relevant and materially add to the story. After all you are never going to get to 100% anyway, so whether that number is 99.1 or 99.2 or 99.4 it makes little difference

ADD COMMENT

Login before adding your answer.

Traffic: 2708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6