Access to SRA read from a specific genome assembly on NCBI
1
0
Entering edit mode
5.0 years ago
Chvatil ▴ 130

Hello, I have just one question about the SRA (Sequence Read Archive). In fact I have one assembly and I would need to map these read against the assembly.

Here is the assembly : https://www.ncbi.nlm.nih.gov/assembly/GCF_000001635.26

and when I type mus musculus In NCBI with the SRA tab I get 27313 results. How can I acces to the read that where used for the genome assembly of https://www.ncbi.nlm.nih.gov/assembly/GCF_000001635.26 exactly ?

Thank you for your help

ncbi Assembly reads • 2.0k views
ADD COMMENT
3
Entering edit mode
5.0 years ago
GenoMax 147k

This is not an assembly that was produced from a single set of reads. If you see the GRC page (mouse section) you can see that

The GRC has produced an updated assembly (GRCm38). This is an update of the last MGSC assembly (MGSCv37) which was described in 2009 (PMID: 19468303). The primary assembly is based on assembling overlapping BAC clones derived from the C57BL/6J strain and several loci have sequence available from other strains.

Statistics report for the assembly you linked gives additional details.

ADD COMMENT
0
Entering edit mode

Ok thank you but what about this assembly for exemple : https://www.ncbi.nlm.nih.gov/assembly/GCA_001675545.1 Where are the R1 and R2 files for instance ?

ADD REPLY
0
Entering edit mode

If you click on "BioProject" under "Related Information" in the right column that will bring you to the project page for this genome. From there you can click on "BioSamples" in "Assembly Details" section to get to this page which has the link for SRA record. There are multiple samples.

ADD REPLY
0
Entering edit mode

and how can I generate a list of SRA ids using a commande line tool ?

ADD REPLY
0
Entering edit mode

Using EntrezDirect :

$ esearch -db assembly -query "GCA_001675545.1" | elink -target bioproject | elink -target sra | efetch -format docsum | xtract -pattern DocumentSummary -ACC @acc -block DocumentSummary -element "&ACC" | cut -f5
SRR6713988
SRR6713956
SRR6713955
SRR6713929
SRR6706566
SRR6513317
SRR6356301
SRR6356302
SRR6356303
SRR6356304
SRR6356305
SRR6356306
ADD REPLY
0
Entering edit mode

Thank you @genomax but for the following genome id : GCA_000612105.2 I get the following error: Retrying elink, step 3: callMLink: Query failed on MegaLink server

and yet there are 4 SRA files here https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=2225314

and for GCA_000612105.2 nothing happens and yet there are 6 SRA files here : https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=3982884

ADD REPLY
0
Entering edit mode

I get the expected result. Perhaps it was a temporary glitch.

$ esearch -db assembly -query "GCA_009394715.1" | elink -target bioproject | elink -target sra | efetch -format docsum | xtract -pattern DocumentSummary -ACC @acc -block DocumentSummary -element "&ACC" | cut -f5
SRR6513331
SRR6346140
SRR6346141
SRR6346142
SRR6346143
SRR6346144
SRR6346145
SRR6346146
ADD REPLY
0
Entering edit mode

I just updated the one that does not work sorry

ADD REPLY
0
Entering edit mode

Works for me :

$ esearch -db assembly -query " GCA_000612105" | elink -target bioproject | elink -target sra | efetch -format docsum | xtract -pattern DocumentSummary -ACC @acc -block DocumentSummary -element "&ACC" | cut -f5
SRR947092
SRR947091
SRR947090
SRR947089
ADD REPLY
0
Entering edit mode

@genomax is there a way to get a table from that such as:

Accession   Instrument  Total Bases (Mb)    Date Created  

    SRR947092.    Illumina HiSeq 2000   66274   03 Aug 2013
ADD REPLY

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6