Entering edit mode
9.4 years ago
datanerd
▴
520
HI all,
I just downloaded some SRA datasets. I got the Text file which has sample number, GEO accession (starting with GSM) and the fastq files are labelled SRR----. I do not know how I know which fastq file belong to which sample. Moreover when I look up the Geo accession- it shows the SRR under run. And there are two run IDs for one GSM. Does that mean I have to combine them? If yes- whats the best way to do this?
Thanks so much!
Mamta
SRA Hierarchy: SRP - project/study, SRS - sample (one or more experiments (SRX)), SRR- runs, experiments has one or more runs. I usually use the NCBI SRA toolkit to download, and convert SRA files to FASTQ and FASTA format. Do you just want to know which sample (SRS) the run (SRR) belongs to?
HI,
I did use the SRA toolkit. But the problem is the matrix or the text files does not have the SRR id which the fastq files have. So do I have to annotate it manually by visiting each sample on the SRA? Like how to know which fast files belongs with which sample ID.
Thanks,
Mamta
Can you tell me how you are converting your .sra files to fastq/fasta format? When I use SRA toolkit, it keeps the SRR number the same. So, if SRR1750023.sra is converted to a fasta file the name will be SRR1750023.fa.
If you want to know the sample that a given SRR is from, you can use the command line to access metadata. In other words, I can get the SRRID and it's metadata including which sample it belongs to. To do this you need the SRP (project) number.
Let's say I want the metadata for SRP = SRP001599. To do this you can run:
This will give you a csv of the metadata associated with the Project.