download all metadata from SRA
1
Using EntrezDirect to get you started. This is likely not a perfect query. I will think about this some more later. Adjust date range as needed.
$ esearch -db sra -query "2021/1/1:2021/1/2[Publication Date]" | elink -target biosample | esummary | xtract -pattern DocumentSummary -element Organism | sort | uniq -c | sort -k1,1nr
170 Glycine max
121 Rhodeus ocellatus kurumeus
83 air metagenome
62 Culex bitaeniorhynchus
48 Culex tritaeniorhynchus
39 Escherichia coli
37 Kalanchoe laxiflora
36 Homo sapiens
35 soil metagenome
32 Mus musculus
22 Rhodeus ocellatus ocellatus
20 feces metagenome
13 Salmonella enterica subsp. enterica serovar Infantis
9 Cardamine flexuosa
8 Salmonella enterica subsp. enterica serovar Kentucky
7 Arabidopsis thaliana
7 Campylobacter jejuni
7 Salmonella enterica subsp. enterica serovar Enteritidis
6 Zea mays
4 Salmonella enterica subsp. enterica serovar Typhimurium
3 Salmonella enterica
3 Salmonella enterica subsp. enterica
3 Salmonella enterica subsp. enterica serovar Newport
2 Salmonella enterica subsp. enterica serovar Agona
2 Salmonella enterica subsp. enterica serovar Eko
2 Salmonella enterica subsp. enterica serovar London
2 Salmonella enterica subsp. enterica serovar Schwarzengrund
2 Vicia sativa
2 mixed culture
1 Abeliophyllum distichum f. lilacinum
1 Aspergillus aculeatinus
1 Campylobacter jejuni subsp. jejuni
1 Fagus sylvatica
1 Nicotiana
1 Physalis pubescens
1 Polygonatum kingianum
1 Rhus punjabensis var. sinica
1 Salmonella enterica subsp. enterica serovar 4,[5],12:i:-
1 Salmonella enterica subsp. enterica serovar Anatum
1 Salmonella enterica subsp. enterica serovar Brandenburg
1 Salmonella enterica subsp. enterica serovar Derby
1 Salmonella enterica subsp. enterica serovar Johannesburg
1 Salmonella enterica subsp. enterica serovar Senftenberg
1 Shigella sonnei
1 freshwater sediment metagenome
1 riverine metagenome
If you are willing to write some code you can extract lot more info from a query like this
$ esearch -db sra -query "2021/1/1:2021/1/2[Publication Date]" | esummary | head -100
https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20130524/esummary_sra.dtd ">
<DocumentSummarySet status="OK">
<DocumentSummary>
<Id>11835626</Id>
<ExpXml> <Summary><Title>RNA-Seq of early induced cardiac progenitors (Day-7)</Title><Platform instrument_model="Illumina HiSeq 2500">ILLUMINA</Platform><Statistics total_runs="1" total_spots="36730347" total_bases="11019104100" total_size="4562391529" load_done="true" cluster_name="public"/></Summary><Submitter acc="SRA1123873" center_name="University of Cincinnati" contact_name="Jialiang Liang" lab_name="Department of Pathology"/><Experiment acc="SRX9106574" ver="4" status="public" name="RNA-Seq of early induced cardiac progenitors (Day-7)"/><Study acc="SRP282054" name="Activation of endogenous genes by CRISPR enables conversion of mouse fibroblasts into cardiac progenitor cells"/><Organism taxid="10090" ScientificName="Mus musculus"/><Sample acc="SRS7349991" name=""/><Instrument ILLUMINA="Illumina HiSeq 2500"/><Library_descriptor><LIBRARY_NAME>T4</LIBRARY_NAME><LIBRARY_STRATEGY>RNA-Seq</LIBRARY_STRATEGY><LIBRARY_SOURCE>TRANSCRIPTOMIC</LIBRARY_SOURCE><LIBRARY_SELECTION>Oligo-dT</LIBRARY_SELECTION><LIBRARY_LAYOUT> <PAIRED/> </LIBRARY_LAYOUT></Library_descriptor><Bioproject>PRJNA662934</Bioproject><Biosample>SAMN16109872</Biosample> </ExpXml>
<Runs> <Run acc="SRR12623858" total_spots="36730347" total_bases="11019104100" load_done="true" is_public="true" cluster_name="public" static_data_available="true"/> </Runs>
<ExtLinks></ExtLinks>
<CreateDate>2021/01/01</CreateDate>
<UpdateDate>2021/02/02</UpdateDate>
</DocumentSummary>
Login before adding your answer.
Traffic: 2046 users visited in the last hour
thanks, seems like a good starting point!
Hi GenoMax, I'd like to revive this thread. For some reason this command
retrieves only 952 entries, which is obviously wrong give the command is correct. In esearch documentation I didn't find this way of date specification, so I am wondering if there is any idea how to fix the command.
You may want to go back to the metadata file folder and get the
SRA_accessions.tab
(LINK, 10G download) file. Extract accessions for date range you need and then look up organisms. That may be more foolproof.thanx I am already downloading the data I need from SRA, lets see which is one is faster