EBI-ENA fastq download sample meta data can not found
3
0
Entering edit mode
6.8 years ago
luyang1005 ▴ 20

Hi, community,

I have downloaded the fastq.gz file from the EBI_ENA website. The project number is PRJNA286762, from paper link http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0143334

However, I downloaded 199 fastq.gz files. But I can not identify what sequences belong to which sample.

At first, I looked for the sequence for this paper's data. I also looked at other data. I still can not find the sample meta data for each fastq.gz file.

@SRR3090984.1 1/2 
GGACTACTGGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGCGCCTCAGCGTCAGTTGCTGTCCAGCAGACCGCCTTCGCCACTGGTGTTCCTCCTGATATCTACGCATTTCACCGCTACACTAGGAATTCCGTCTGCCTCTCCAGTACTCAAGAACTACAGTTTCAAATGCAGGCCACAGGTTGAGCCCGTGGTTTTCACATCTGACTTGCAGTCCCGCCTACACGCCCTTTACACCCAGTAAATCCG
+
AABAFF5DFBBAEFGGGBGGGHHHHHHHHFHGGHDEGGGGHGDFGGGGHHHGGGGGHHHEHHHHHGH3GH3?FEFGGGHHGGGGHHHHGHGHHHHHF3BFGHHHHGGGGGGHHHHHGGGGGHH22FGFHHGHHFFHF/?GGHHFHFCGGHGHHHH1<F11>GHHHHHGHFGHHHD..<<:.:AGHGGHGHG?C.9;;FEFBF0BF00;;CF0F09;B0CBGGF.B/:.@A@=BF/9BFFFE.;BBB;9B. 
@SRR3090984.2 2/2 
GGACTACTGGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGCGCCTCAGCGTCAGTTGCTGTCCAGCAGACCGCCTTCGCCACTGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCGTCTGCCTCTCCAGTACTCAAGAACTACAGTTTCAAATGCAGGCCACAGGTTGAGCCCGTGGTTTTCACATCTGACTTGCAGTCCCGCCTACACGCCCTTTACACCCAGTAAATCCG
+ 
3AAAAFFFFFBBE?EFGGFGCGHHHHHGHHHGGFGGGGGGGGGGGGGGGHHHGGGFGFGHHHHHHHHHHHHHHHGGGGGHHGGGGHHHHHHGGHHHHHHHHGHHHHHGGGGGHGHHHGGGGGHHHHHHHBGHHGHHGHGHHHHHHHGGHHHHHHHHHHHHHHHHHHHHHHHHHGHHHGCEGCGCGHHGHHFGGGGGGGGGGGGFGGGGFFFGGGGGGGBBBBBDDFFFFBABBFFFFFFFDAEFFEFFFFD     
@SRR3090984.3 3/2
GGACTACAGGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCAACGTCAGTTACAGTCCAGTAAGCCGCCTTCGCCACTGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCACTTACCTCTCCTGCACTCCAGTCATACAGTTTCCAAAGCAGTTCAGGGGTTGAGCCCCTGCATTTCACTCCAGACTTGCATTACCGTCTACGCTCCCTTTACACCCAGTAAATCCG

Having this problem for a long time. Can not solve this. Is there anyone who can help me with this? Millions of thanks.

RNA-Seq next-gen sequence • 2.9k views
ADD COMMENT
2
Entering edit mode
6.8 years ago
abbey ▴ 210

I took a quick look at this and you can look up this experiment in SRA: https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=286762

For each file, it says the sample name under "Library". It also says the run number, which corresponds to the run accession number on the EBI ENA page for this study (https://www.ebi.ac.uk/ena/data/view/PRJNA286762).

Doing this manually for all 199 fastq files will probably not be fun. I hope someone else knows a better way :)

ADD COMMENT
0
Entering edit mode

Thanks for your reply! Helpful!

ADD REPLY
0
Entering edit mode

Please accept answers (can be all) that you found useful (use the green check mark) to provide closure to this thread. Sounds like it is all of them in this thread.

Upvote|Bookmark|Accept

ADD REPLY
2
Entering edit mode
6.8 years ago
h.mon 35k

You can find the information at NCBI. For example, the BioSample links for all samples from this project. Here is the BioSample link for the particular sample from your post.

edit: you can use the sent to -> file -> format "Full text" at the page I and abbey linked ( https://www.ncbi.nlm.nih.gov/biosample?LinkName=bioproject_biosample_all&from_uid=286762 ) to save the metadata for all records. Here are the first and last records:

1: MIMS Environmental/Metagenome sample from gut metagenome
Identifiers: BioSample: SAMN04388293; Sample name: AM9; SRA: SRS1239704
Organism: gut metagenome
Attributes:
    /collection date="2014-10"
    /environment biome="intestine environment (ENVO:2100002)"
    /environment feature="mammalia-associated habitat (ENVO:00009002)"
    /environment material="feces"
    /geographic location="USA:Midwest:Columbia, MO"
    /host="Equus ferus caballus"
    /latitude and longitude="38.9514 N 92.3283 W"
    /animal_ID="1"
    /sample material processing="Power Fecal"
    /environmental package="host-associated"
Description:
Keywords: GSC:MIxS;MIMS:4.0
Accession: SAMN04388293 ID: 4388293

[...]

200: DNA isolated from fecal samples of multiple vertebrate species Identifiers: BioSample: SAMN03769901; Sample name: Gastrointestinal microbiota (fecal sample DNA from multiple animal species) Organism: gut metagenome Attributes:
    /breed="fish, mouse, cat, dog, horse"
    /isolate="Fecal samples"
    /age="unknown"
    /development stage="adult"
    /sex="pooled male and female"
    /tissue="fecal sample"
    /disease="Healthy"
    /geographic location="USA"
    /health state="healthy"
    /sample type="fecal samples"
    /storage conditions="-80" Description: DNA isolated from fecal samples of fish, mouse, cat, dog and horse. Accession: SAMN03769901  ID: 3769901
ADD COMMENT
0
Entering edit mode

I never noticed that button. Super helpful, thank you!

ADD REPLY
0
Entering edit mode

Thanks so much! Really helpful!

ADD REPLY
2
Entering edit mode
6.8 years ago
ATpoint 85k

Use the NCBI RunSelector. To do so, go to NCBI front page and enter "PRJNA286762", selecting SRA in the dropdown menu left of the query field. This page will pop up. Click "Send results to Run Selector" to get the next page. Use the greenish "+" button to select all 199 files, then press "RunInfo Table". It will download a text file with the metadata you need. It includes the prep kit the sample was made from, the sequencer, total reads, SRR ID etc. From there on, you can simply filter out what you need.

ADD COMMENT
0
Entering edit mode

Thanks so much! Really helpful!

ADD REPLY

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6