Hello, I have downloaded all the cancer types from broad's GDAC Firehose and I've unzipped. There are a ton of files the mage-tab, aux, and level for each piece of data (clinical, rna-seq, protein). I was hoping to find some already established code (R or python) that pulls only the "level" files, pulls the txt files for clinical data, and rna-seq data into an RObject for that cancer type. This would map the sample data identifier to the clinical data identifier, there are so many tcga id's it's hard to parse.
I thought this would be something that is commonly done all the time. I can write the code but I am slow and don't want to reinvent the wheel. I want all cancer types, with clinical variables and rna-seq RSEM data into an RObject for each type. Oh, and I want a way to toggle whether or not the sample is "normal". I think I can pull this from the clinical file.
Any help or pointers would be great!
Thanks!
Sure, I have seen these. My concern is what happens in processing. Is there normalization? Are there samples dropped, if so , why? I want the data with as little manipulation as possible.