Entering edit mode
4.6 years ago
Hello
I want to know how we can check the quality of the genomes present at the NCBI.
I'm interested to check the following: N50, L50, fine consistency, genome contamination, coarse consistency.
I have a few questions w.r.t this:
- As I was checking a few tools to do the same as QUAST, the input to this program is the contig data for the sequenced genome. Is it is necessary to have contig data to check the quality? Can't we do it just only with the genomes files provided by the NCBI?
- Where I can find the contig data for the genomes present at the NCBI?
- What other tools I can use to check the quality of the genomes ass I'm specifically looking for Viral genome?
Thanks.
Which repository? Genomes in
RefSeq
are as good (dare I saygold standard
) as you are going to get the data anywhere. In case of GenBank genomes you can take a look at the assembly_summary file. Column 8 should tell you the status (e.g.Complete genome
).I'm talking about the genomes present at GenBank only. I have the complete genomes but still, I want to check their quality?
I am not sure what your criteria is for
quality
. You could at best look at their counterparts inRefSeq
and compare. In some cases you could find the original bioprojects (and data) but then you would be doing the assembly etc yourself.