Information like Genome size, gene count, chromosomes numbers, contigs and scaffolds etc using a list of plant species
1
0
Entering edit mode
2.9 years ago
Nelo ▴ 20

I have a list of plant species for which I want to get information like:

Genome size; protein coding gene; chromosomes numbers; contigs and scaffolds

Is there any easiest way for this (programmatically or via tools)

Genome • 1.5k views
ADD COMMENT
0
Entering edit mode
2.9 years ago
Michael 55k

For genome sizes, you might be interested the Plant c-value database https://cvalues.science.kew.org/

When it comes to assembly values I would recommend looking into NCBI and find the representative RefSeq or Genbank genome per species if it exists. This could be automated via Entrez e-utils. Sometimes there is information about the number of contigs also, but this information can be unavailable. If the genome is annotated, you can get information about the number of protein-coding genes from the annotation files, but sometimes such annotation is unavailable. Then you can look for any publications, but don't trust any gene numbers given in publications if you have the annotation file also. In the end, you will have to do this semi-automated and curate manually because the process is error-prone.

ADD COMMENT
0
Entering edit mode

Hello Could you please write down the Entrez e-utils command

ADD REPLY
1
Entering edit mode

Here is a simple example in e-utils that will fetch simple assembly stats for a species, in this case Medicago truncatula.

The code needs to be run in the linux shell, eg. under bash The first line fetches the accession of the representative assembly for that species, the second line fetches the stats for that assembly.

ACC=$(esearch -db genome -query "Medicago truncatula"[Organism:exp] | efetch -format docsum | xtract -pattern DocumentSummary  -element Assembly_Accession)
esearch -query $ACC -db assembly | efetch -format docsum | xtract -pattern DocumentSummary -block Stats -tab "\n"  -element Stat@category Stat

output:

alt_loci_count  chromosome_count    contig_count    contig_l50  contig_n50  non_chromosome_replicon_count   replicon_count  scaffold_count  scaffold_count  scaffold_count  scaffold_count  scaffold_l50    scaffold_n50    total_length    ungapped_length
0   8   64  7   23305320    2   10  42  10  0   32  4   56236587    430008013   429433753

You may have to modify that script towards your needs.

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6