RNA-Seq raw fastq files from TCGA

1

Entering edit mode

7.9 years ago

oriolebaltimore ▴ 190

Dear group,

I am looking for raw FASTQ files for RNA-Seq TCGA data. The BAM files were made using reads that map only to known genes. I am looking to get FASTQ files that were not filtered in anyway to retain reads mapping to known genes only.

I have access to Level 1 data through an approved protocol.

Thanks Adrian.

RNA-Seq TCGA fastq • 8.5k views

ADD COMMENT • link 7.9 years ago by oriolebaltimore ▴ 190

0

Entering edit mode

Sorry - I forgot to add that - is it possible to get raw FASTQ files from TCGA. Thanks

ADD REPLY • link 7.9 years ago by oriolebaltimore ▴ 190

1

Entering edit mode

Are you sure about that? You can see the command line used to map the reads in the BAM header. Nowhere do I see anything that suggests that only reads mapping to known genes were kept? Only known genes were used when quantifying, but that's different.

Fastq's only exist in the TCGA legacy archive, whic hI don'tthink contains everything.

ADD REPLY • link 7.9 years ago by i.sudbery 20k

1

Entering edit mode

The legacy archive does contain all fastq files for RNA-Seq data. They are the TARGZ format.

Link to GDC Legacy Archive

ADD REPLY • link 7.9 years ago by nwon ▴ 60

0

Entering edit mode

that's correct. you just need to convert the supplied bam files to raw reads available through GDC.

here's an pipeline example: https://github.com/mforde84/TCGA-BRCA-RNAseq-realignment-pipeline

also from experience, converting bam to fastq is a bottleneck. picard has an option but it's really slow. the scripting provided above has a custom solution called fasty to do this. however i couldn't locate my source code. instead you could use something like the following which should be as fast: https://github.com/arq5x/bedtools2/blob/master/src/bamToFastq/bamToFastq.cpp

ADD REPLY • link 7.9 years ago by mforde84 ★ 1.4k

Login before adding your answer.