Downloading metadata of all insect SRA sequences from NCBI SRA
1
0
Entering edit mode
12 weeks ago
nitinra ▴ 50

Hello all,

I am planning to screen 10 sequences/species for all insect species (minimum 1 sequence) that have an SRA sequence on NCBI for specific bacteria. So far, to download the SRA accession, I am navigating through the NCBI taxonomy, manually checking sequence metadata to find out species id etc and then downloading it. Is there any way to speed this process up where I can download either the entire insect SRA metadata or even at the order level to make my job easier? If I can download the metadata, I can then select the sequences I want and use batch entrez to download it. Any help in this would be greatly appreciated!

Thanks!

database NCBI SRA • 248 views
ADD COMMENT
3
Entering edit mode
12 weeks ago

I believe this should work:

esearch -db sra -query "Arthropoda[Organism]" |  efetch -format runinfo

This will print rows in comma-delimited format like:

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
DRR577282,2024-08-26 13:38:19,2024-08-26 13:54:36,2049652,22280260442,0,10870,13062,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-34/DRR000/577/DRR577282/DRR577282.1,DRX560715,,WGS,other,GENOMIC,SINGLE,0,0,PACBIO_SMRT,Sequel IIe,DRP011949,,,0,DRS402361,,simple,79782,Cimex lectularius,DRS402361,,,,,,,no,,,,,HIROSHIMA UNIVERSITY,DRA018917,,public,26389700DC628652166EE77F6BBC90B2,725D0A49A4AEBD8933AC764D79FDDE0B
DRR577283,2024-08-26 13:38:19,2024-08-26 14:00:19,1990260,20130137244,0,10114,11641,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-34/DRR000/577/DRR577283/DRR577283.1,DRX560716,,WGS,other,GENOMIC,SINGLE,0,0,PACBIO_SMRT,Sequel IIe,DRP011951,,,0,DRS402363,,simple,79782,Cimex lectularius,DRS402363,,,,,,,no,,,,,HIROSHIMA UNIVERSITY,DRA018918,,public,77AB85865C455CC534EE518AA439ADB0,F4C6CCCFAF1C8B7C2FAFD5D9432D0466

You can make this faster by being a bit more stringent in the query - for example, this will download all RNASeq and genomic sequences, perhaps you only want one of them. Then use your favorite tool to pull out only the columns you need!

ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6