Hey everyone. I'm building a RNA-Seq analysis pipeline, but I need to find some TCGA FASTQ files to work on. I see that the TCGA data are now at GDC, but the only fastq files they have are controlled access, and I won't be able to get that, as I don't have an institution right now. I also saw that there are tools for converting BAM files to FASTQ, but I assume that won't cover reads that don't map to the genome, and it won't allow me to see all the quality scores in the FASTQ. I need this to test various hypotheses. Does anyone know how to get hold of some actual, original FASTQ files?
Thanks very much!
As you already answered yourself, FASTQs are controlled access.
...so are BAMs in TCGA, ICGC, dbGaP if data come from human patients. The BAMs should include unmapped reads unless they filtered them out. Hopefully not as these might be used for structural variant detection etc. Anyway, if you do not have access, nothing you can do about it unfortunately. It is also not permitted by the terms of use of these databases to share data from one user to another who is not part of the access application of a respective project. For building pipelines, maybe a different dataset might serve the same purpose? There are plenty of open-access datasets available in GEO and ENA.