Obtain N50 stats from NCBI for all Hymenoptera genome assemblies
1
0
Entering edit mode
5.0 years ago

Hello,

I would like to obtain scaffold N50 from all Hymenoptera genome assemblies found in NCBI page under Global statistics (e.g https://www.ncbi.nlm.nih.gov/assembly/GCF_000214255.1). Is there a way to quicky have access to these stats? I am looking for a dump-like command line.

Thanks in advance!

N50 NCBI • 1.5k views
ADD COMMENT
2
Entering edit mode

I always find convinient to start with NCBI Taxonomy. One approach can be -

ADD REPLY
8
Entering edit mode
5.0 years ago
vkkodali_ncbi ★ 3.8k

You can use Entrez Direct for this as shown below:

esearch -db assembly -query 'txid7399[organism:exp]' \
  | esummary \
  | xtract -pattern DocumentSummary -element AssemblyAccession,AssemblyName,SpeciesTaxid,Organism,ContigN50,ScaffoldN50
GCA_000184785.2  Aflo_1.1                  7463     Apis florea (little honeybee)                        24915    2863240
GCA_009650705.1  Solenopsis_invicta_SB1.0  13686    Solenopsis invicta (red fire ant)                    945877   13114153
GCA_009602685.1  ASM960268v1               63436    Leptopilina heterotoma (wasps, ants, and bees)       9278     11848
GCA_009299975.1  ASM929997v1               13686    Solenopsis invicta (red fire ant)                    874937   16736736
GCA_009299965.1  ASM929996v1               13686    Solenopsis invicta (red fire ant)                    278331   11613644
ADD COMMENT
0
Entering edit mode

Really liked this solution!

ADD REPLY
0
Entering edit mode

I used this and it was perfect for my needs. Now I also want the date of the assembly. I tried the following but the resulting file was the same as your result (ie no column for the date). Do you know how I could do this?

esearch -db assembly -query 'txid7399[organism:exp]' \
  | esummary \
  | xtract -pattern DocumentSummary -element AssemblyAccession,Date,AssemblyName,SpeciesTaxid,Organism,ContigN50,ScaffoldN50
ADD REPLY
1
Entering edit mode

You need to provide the complete element name, for example AsmReleaseDate_RefSeq, as shown below:

esearch -db assembly -query 'txid7399[organism:exp]' \
  | esummary \
  | xtract -pattern DocumentSummary -element AssemblyAccession,AsmReleaseDate_RefSeq,AssemblyName,SpeciesTaxid,Organism,ContigN50,ScaffoldN50
GCA_010883055.1 1/01/01 00:00   B_treatae_v1    1159321 Belonocnema treatae (wasps, ants, and bees)     19588   150973230
GCA_010645185.1 1/01/01 00:00   Tetragonula_hockingsi_1.1       270528  Tetragonula hockingsi (bees)    10501   10501
GCA_010645165.1 1/01/01 00:00   Tetragonula_davenporti_1.1      597209  Tetragonula davenporti (bees)   18510   18510
GCA_010645135.1 1/01/01 00:00   Tetragonula_clypearis_1.1       270525  Tetragonula clypearis (bees)    14852   14852
ADD REPLY

Login before adding your answer.

Traffic: 2671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6