Question

SRA archive metadata

0

Entering edit mode

6.7 years ago

tim.ivanov.92 ▴ 40

I'm using this command

esearch -db sra -query SRR1399843 | efetch -format runinfo

To obtain metadata about SRA archive.

it results in such information:

Run SRR1399843

ReleaseDate 2014-06-14 13:41:56

LoadDate 2014-10-04 03:42:33

spots 40704619

bases 6187102088

spots_with_mates 40704619

avgLength 152

size_MB 3177

AssemblyName GCF_000001405.25

download_path @dbgap@:reads/SRP012682/SRS637847/SRX599630/SRR1399843/SRR1399843.sra

Experiment SRX599630

LibraryName Solexa-227108

LibraryStrategy RNA-Seq

LibrarySelection cDNA

LibrarySource TRANSCRIPTOMIC

LibraryLayout PAIRED

InsertSize 150

InsertDev 311.773

Platform ILLUMINA

Model Illumina HiSeq 2000

SRAStudy SRP012682

BioProject PRJNA75899

Study_Pubmed_id

ProjectID 75899

Sample SRS637847

BioSample SAMN02791143

SampleType simple

TaxID 9606

ScientificName Homo sapiens

SampleName GTEX-13QIC-1626-SM-5K7TZ

g1k_pop_code

source

g1k_analysis_group

Subject_ID 985098

Sex female

Disease

Tumor no

Affection_Status

Analyte_Type RNA:Total RNA

Histological_Type Blood Vessel

Body_Site Artery - Tibial

CenterName BI

Submission SRA123108

dbgap_study_accession phs000424

Consent GRU

RunHash 478268EA67D40812258F63CDD4F1FE4A

ReadHash 4B32F0F08BF1C763FD72BCF414D77F76

How can modify my request, so that i could understand whether an archive has or has not been mapped? i.e. to understand whether there are mapped reads inside, or raw?

SRA sra-toolkit ncbi • 2.3k views

ADD COMMENT • link updated 6.7 years ago by vkkodali_ncbi ★ 3.8k • written 6.7 years ago by tim.ivanov.92 ▴ 40

score 1 · Answer 1 · 2018-11-07

1

Entering edit mode

6.7 years ago

vkkodali_ncbi ★ 3.8k

Access to the run accession in your example appears to be controlled. However, you can search for SRA data with aligned reads by adding aligned_data[Properties] filter to your query like this:

esearch -db sra -query 'Homo sapiens[Organism] AND aligned_data[Properties]'

ADD COMMENT • link 6.7 years ago by vkkodali_ncbi ★ 3.8k

0

Entering edit mode

Thank you for your reply!

I've actually trying to obtain metadata for already downloaded files (they are controlled, but i have a key)

Can you specify what is it that i see in the output of your request?

each line is like:

SRR7944888,2018-09-30 18:41:15,2018-09-30 18:26:12,11129735,1531891705,0,137,879,GCA_000001405.13,https://sra-download.ncbi.nlm.nih.gov/traces/sra71/SRR/007758/SRR7944888,SRX4779187,Z-138-REPLIg-E3-IonPlus,WGA,MDA,GENOMIC,SINGLE,0,0,ION_TORRENT,Ion Torrent PGM,SRP162960,PRJNA494024,,494024,SRS3859968,SAMN10147560,simple,9606,Homo sapiens,Z-138 (Mantle Cell Lymphoma) cell line,,,,,male,,no,,,,,UNIVERSITY OF VIGO,SRA787348,,public,C0C5E534A5AD060C2F8111B2208089E7,A7D604F965FB33846C8EB5810F31298E

does it mean, that all lines i see here have as first word (SRR7944888 in this example) an id of project which does indeed contain aligned reads inside?

ADD REPLY • link 6.7 years ago by tim.ivanov.92 ▴ 40

0

Entering edit mode

If the reads are aligned, then the efetch output XML has the term AlignInfo and some associated data. If all you want to know is whether the SRR accession you have comes with aligned reads or not, you can probably do something like this:

## this is your example; it has alignments
esearch -db sra -q 'SRR7944888' | efetch | grep -c 'AlignInfo'
1
## this example does not have aligned reads
esearch -db sra -q 'SRR299116' | efetch | grep -c 'AlignInfo'
0

ADD REPLY • link 6.7 years ago by vkkodali_ncbi ★ 3.8k