extract TCGA data
1
0
Entering edit mode
6.6 years ago
Learner ▴ 280

I am searching for an alternative to extract data from TCGA. I know this one http://www.linkedomics.org/admin.php but there was another web which was very easy but I don't remember. Any thought ?

genomics • 3.0k views
ADD COMMENT
0
Entering edit mode

LinkedOmics to extract TCGA data? LinkedOmics is an analysis tool, and is quite removed from TCGA's raw data. (I'm part of the lab that developed LinkedOmics)

ADD REPLY
0
Entering edit mode

Here you go TCGA Assembler is another one apart from TCGAbiolinks and recount. Just remember all these will only provide you with matrix of count data or FPKM/TPM . This means they are quantified with RSEM. You cannot obtain raw data unless you specifically apply for GDC and ICGC approval and gain the access. Hope this helps.!

ADD REPLY
1
Entering edit mode

Dear vchris,

just a small comment : actually, not all these resourses/tools/projects are quantified with RSEM. This is "mostly TRUE" for "legend" TCGA data in the old repository (level 3 or 4)-for example, the harmonized versions with GDC:

https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#mrna-expression-workflow

which can be accessed from TCGAbiolinks, can produce raw HTSeq gene counts, etc.

Also you can query raw sequencing data:

example

query <- GDCquery(project = c("TCGA-BRCA"),
                  data.category = "Raw Sequencing Data",  
                  sample.type = "Primary solid Tumor")
ADD REPLY
1
Entering edit mode

Right, my apologies, I should have clarified it better. Mostly the legendary data with RSEM while one can also access STAR aligned data and retrieve counts via HT-Seq and normalized data as well. But to access BAMs (qualifies for raw data since you can go back to prepared the fastq files from them or raw fastq files) you need the higher access. Just to clarify the Raw sequencing in GDCquery if I am not wrong will not be able to download the raw legacy data unless one have the token file and has access to the controlled data. Check this link

P.S: I have spent days and nights to find a way to download raw data without access control but later got the access with tokens, and still its not very straight forward post access. ;)

ADD REPLY
1
Entering edit mode
6.6 years ago
svlachavas ▴ 790

Hi,

there are various repositories and R packages for accessing and downloading TCGA data. For example, take a look at the TCGAbiolinks R package, with various options, including both raw data as also processed, harmonized etc: http://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/query.html

Also other projects, such as the recount2 source:

https://jhubiostatistics.shinyapps.io/recount/

But the most important question is actually what is your biological question of interest ? and what would you like to search ?

Best,

Efstathios-Iason

ADD COMMENT

Login before adding your answer.

Traffic: 1657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6