I have a list of plant species for which I want to get information like:
Genome size; protein coding gene; chromosomes numbers; contigs and scaffolds
Is there any easiest way for this (programmatically or via tools)
I have a list of plant species for which I want to get information like:
Genome size; protein coding gene; chromosomes numbers; contigs and scaffolds
Is there any easiest way for this (programmatically or via tools)
For genome sizes, you might be interested the Plant c-value database https://cvalues.science.kew.org/
When it comes to assembly values I would recommend looking into NCBI and find the representative RefSeq or Genbank genome per species if it exists. This could be automated via Entrez e-utils. Sometimes there is information about the number of contigs also, but this information can be unavailable. If the genome is annotated, you can get information about the number of protein-coding genes from the annotation files, but sometimes such annotation is unavailable. Then you can look for any publications, but don't trust any gene numbers given in publications if you have the annotation file also. In the end, you will have to do this semi-automated and curate manually because the process is error-prone.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello Could you please write down the Entrez e-utils command
Here is a simple example in e-utils that will fetch simple assembly stats for a species, in this case Medicago truncatula.
The code needs to be run in the linux shell, eg. under bash The first line fetches the accession of the representative assembly for that species, the second line fetches the stats for that assembly.
output:
You may have to modify that script towards your needs.