Hi,
What are people using to search for raw fastq files in GEO or SRA made available from past publications for a specific cell? I am currently using SRA search database with key words, specifying in the search term:
GM12878[All Fields] AND cluster_public[prop]
However this misses some downloadable data.
There is a nice post here (How to download raw sequence data from GEO/SRA) to retrieve these files using fastq-dump, but this assumes the GEO accession number, or the SRA project page, is already known.
Is there another searchable engine that helps to search for these more efficiently that people are currently using? I see there is an R package called GEOquery to help with GEO searches - I haven't looked at it yet but if it is possible to limit by only certain files and cell types, this may be the best option.
Thanks much!
Are you able to use the SRA search and send the results to SRA run selector? Then you can collect a list of SRA runs (start with SRR.ERR.DRR) which you can pass to prefetch.
I have used SRAdb and SRAdbV2 in the past, but these packages are no longer maintained
Thanks! The sra-explorer is great. What about GEO? It looks like some datasets available only on GEO and not SRA. Adding the search "filetype fastq"[Properties]) AND cluster_public[prop] into the GEO DataSets or GEO Profiles search boxes in NCBI doesn't do the job for filtering for publicly available fastq files.
Can you provide an example?
Hi, for example GSE96107... unless I am confused about the conversion of this GEO accession to SRA...
That just looks like the top level series accession for multiple samples. If you look at the individual 90+ samples each of those has a SRA accession.