is GDC portal have COLORECTAL (COADREAD) cancer ?
1
0
Entering edit mode
4.8 years ago
Chaimaa ▴ 260

Hello guys, Im very familiar with Broad institute portal, but now since i need to work with molecular profiles and clinical drug response data. now i learned how to obtain clinial drug response data from GDC portal https://portal.gdc.cancer.gov/ my cancer of interest is colorectal cancer (COADREAD) , but i found that this type of cancer is not availabe in this GDC portal ( primary site). i only find colon (coad) instead, is this right?

Another issue, if i want to get clinical drug response data from GDC portal and the molecular profiles of the same patient id from Broad insitutue firhose is this biologically reasonable and have no effect on the results or the quality control?

Appreciate any help!

cancer gene expression • 2.6k views
ADD COMMENT
0
Entering edit mode

please, Any help or any hints or a tutorial to read about this issue if that possible?

ADD REPLY
2
Entering edit mode
4.8 years ago

Hello Chaimaa - good to see you again. I think that the COAD + READ are the TCGA cohorts that comprise 'COADREAD', or am I incorrect?

Here is a configured search. You can then further filter by Primary Site.

Edit: Here is a configured search for RNA-seq HT-Seq count files for just rectal cancer (READ).

Kevin

ADD COMMENT
0
Entering edit mode

Hi @kevin, so happy to see you too, and I'm very thankful for the valuable information you provide here it really gives me always insightful biological knowledge. Here is the configured search i adopt to get for example COADREAD drug response data. However, i got 2 separate txt files, one is for coad drug response data and the second one is for read drug response data, So i have to combine both files together right? for further analysis?

ADD REPLY
0
Entering edit mode

Descriptive picture

ADD REPLY
1
Entering edit mode

Hey Chaimaa, yes, you can just combine the 2 files, like, row-bind them (rbind()). You should be able to link these to your expression data (or whatever other data that you have) via the UUID and / or TCGA barcode.

So, just to be pedantic for others: the COAD (COlon ADenocarcinoma) and READ (REctum ADenocarcinoma) (COADREAD) TCGA datasets are ultimately labeled under the common term of 'Colo-Rectal' Cancer, which are often studied together in real life.

ADD REPLY
0
Entering edit mode

Great Thanks! @Kevin, very grateful for your usual help!.

Yes i want to do the analysis from an integrative view (gene expression, DNA methylation and somatic copy number alteration (CNV)) if the sample size was reasonable.

However, i yet don't know how to deal with GDC portal tcga data and even how to use JSON manifest, that what i really wan to learn. Thus, may be i will take these clinical informations with the corresponding samples (bcr_patient_barcode ) i find, and link them with the same samples from multi omics data (gene expression, DNA methylation and somatic copy number alteration (CNV) from BROAD firehose here, is it ok ?

ADD REPLY
1
Entering edit mode

Yes, it is not easy - I apologise on behalf of the National Cancer Institute.

With your JSON manifest file, you can use the data transfer tool to download data: https://gdc.cancer.gov/access-data/gdc-data-transfer-tool

The CNV data is even more difficult... can you confirm the specific file that you want to obtain?

If you have any other specific questions, let me know. I am aware that it is difficult.

ADD REPLY
0
Entering edit mode

@Hi Kevin, Thanks a lot! and i would like to share my progress here. For CNV, the specific files could be level-4 non-discretized gene-level focal alterations computed by GISTIC2.0 or "all_data_by_genes.txt" table from GISTIC 2.0, These 2 files can be easily obtained from here.

But for gene expression, DNA methylation and copy number variation from GDC data portal , this is what i tried and i still can't obtain the data from the tool you recommend for me. Sorry for my low efficiency.

gene expression _manifest file in the transfer tool Gene expression: GE_query

DNA methylation: DNA_query

CNV: CNA_query

This what i get for gene_expression when i download the manifest file and install the GDC data transfer tool, i don't have the token file and also can't login in the portal, what i have to do next, is there any other way to find the description of these files, when i was working with the broad institue i can directly find the gene expression file with the sampls and the genes with the expression values ?

https://i.postimg.cc/9fyhGr29/png1.png

I appreciate any help or suggestions!

ADD REPLY
1
Entering edit mode

Hey dude, hmm... You should not require a token for this data, as it is supposed to be open access. It has not downloaded even a single file?

The gene expression files in your list are raw counts produced by HT-seq. You will have to bind these file together and then use them as input to DESeq2 or EdgeR.

The methylation data that you have selected is already normalised to Beta values.

An explanation of all TCGA methodologies can be found here (link configured to land straight on methylation section): Methylation Liftover Pipeline

-------------------

Another source of data is Xena: https://xena.ucsc.edu/public/

There, the explanation of the processing for each dataset is also explained. It may be easier for you to use the Xena data.

ADD REPLY
1
Entering edit mode

@Kevin, Thanks a lot for your great help and valuable comments.

ADD REPLY

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6