RNA-Seq raw fastq files from TCGA
0
1
Entering edit mode
7.9 years ago

Dear group,

I am looking for raw FASTQ files for RNA-Seq TCGA data. The BAM files were made using reads that map only to known genes. I am looking to get FASTQ files that were not filtered in anyway to retain reads mapping to known genes only.

I have access to Level 1 data through an approved protocol.

Thanks Adrian.

RNA-Seq TCGA fastq • 8.5k views
ADD COMMENT
0
Entering edit mode

Sorry - I forgot to add that - is it possible to get raw FASTQ files from TCGA. Thanks

ADD REPLY
1
Entering edit mode

Are you sure about that? You can see the command line used to map the reads in the BAM header. Nowhere do I see anything that suggests that only reads mapping to known genes were kept? Only known genes were used when quantifying, but that's different.

Fastq's only exist in the TCGA legacy archive, whic hI don'tthink contains everything.

ADD REPLY
1
Entering edit mode

The legacy archive does contain all fastq files for RNA-Seq data. They are the TARGZ format.

Link to GDC Legacy Archive

ADD REPLY
0
Entering edit mode

that's correct. you just need to convert the supplied bam files to raw reads available through GDC.

here's an pipeline example: https://github.com/mforde84/TCGA-BRCA-RNAseq-realignment-pipeline

also from experience, converting bam to fastq is a bottleneck. picard has an option but it's really slow. the scripting provided above has a custom solution called fasty to do this. however i couldn't locate my source code. instead you could use something like the following which should be as fast: https://github.com/arq5x/bedtools2/blob/master/src/bamToFastq/bamToFastq.cpp

ADD REPLY

Login before adding your answer.

Traffic: 1717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6