TCGA Broad GDAC Firehose Parse and Match
1
0
Entering edit mode
6.8 years ago
hood821 • 0

Hello, I have downloaded all the cancer types from broad's GDAC Firehose and I've unzipped. There are a ton of files the mage-tab, aux, and level for each piece of data (clinical, rna-seq, protein). I was hoping to find some already established code (R or python) that pulls only the "level" files, pulls the txt files for clinical data, and rna-seq data into an RObject for that cancer type. This would map the sample data identifier to the clinical data identifier, there are so many tcga id's it's hard to parse.

I thought this would be something that is commonly done all the time. I can write the code but I am slow and don't want to reinvent the wheel. I want all cancer types, with clinical variables and rna-seq RSEM data into an RObject for each type. Oh, and I want a way to toggle whether or not the sample is "normal". I think I can pull this from the clinical file.

Any help or pointers would be great!

Thanks!

rna-seq R • 2.3k views
ADD COMMENT
0
Entering edit mode
6.8 years ago
vinvan ▴ 50

There are quite a few R packages out there that do exactly this. You can check TCGABiolinks or TCGA2STAT.

ADD COMMENT
0
Entering edit mode

Sure, I have seen these. My concern is what happens in processing. Is there normalization? Are there samples dropped, if so , why? I want the data with as little manipulation as possible.

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6