Question

TCGA dataset normalization

0

Entering edit mode

3.2 years ago

ruby4bioinfo • 0

hi. i am new to machine learning. i want to normalize my data which I downloaded from UCSC Xena browser for pancreatic cancer TCGA PAAD is its id. when I try to run this code it is showing error given below

library( "DESeq2" )
library(ggplot2)
countData <- read.csv('PAAD.csv', header = TRUE, sep = ",")
head(countData)
options(max.print = 100000)
metaData <- read.csv('PAADPheno.csv', header = TRUE, sep = ",")
metaData
dds <- DESeqDataSetFromMatrix(countData=countData, 
                              colData=colData, 
                              design=~adenocarcinoma_invasion)

> dds <- DESeqDataSetFromMatrix(countData=countData, 
+                               colData=colData, 
+                               design=~adenocarcinoma_invasion)
Error in `rownames<-`(`*tmp*`, value = colnames(countData)) : 
  attempt to set 'rownames' on an object with no dimensions

please help me resolving this issue or give me the code for it.

machine TCGA DESeq2 learning normalization • 1.7k views

ADD COMMENT • link 3.2 years ago by ruby4bioinfo • 0

1

Entering edit mode

You haven't defined colData and it defaults to the function colData() from some package (probably DESeq2?). You probably meant to use metaData like this:

dds <- DESeqDataSetFromMatrix(countData=countData, 
                              colData=metaData, 
                              design=~adenocarcinoma_invasion)

ADD REPLY • link 3.2 years ago by bigomics.team ▴ 90

0

Entering edit mode

Hello. Please show the output of:

str(countData)
str(colData)

Thank you and kind regards

ADD REPLY • link 3.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks Kevin, i got following output

 str(countData)
'data.frame':   60488 obs. of  183 variables:
 $ Ensembl_ID      : chr  "ENSG00000000003.13" "ENSG00000000005.5" "ENSG00000000419.11" "ENSG00000000457.12" ...
 $ TCGA.3A.A9IO.01A: num  8.64 3.91 9.86 9.69 6.85 ...
 $ TCGA.US.A774.01A: num  11.04 2 10.51 10.13 7.86 ...
 $ TCGA.HZ.A49H.01A: num  10.84 3.46 10.49 9.43 7.27 ...
 $ TCGA.FB.A4P5.01A: num  9.85 3.58 9.93 8.78 7.17 ...
 $ TCGA.FB.AAPS.01A: num  10.21 5.78 9.78 8.92 7.01 ...
 $ TCGA.IB.AAUQ.01A: num  10.1 2.32 9.94 9.02 7.67 ...
 $ TCGA.HV.A5A5.01A: num  10.89 0 10.08 10.07 7.55 ...
 $ TCGA.H6.A45N.11A: num  8.54 2.58 10 9.67 7.95 ...
 $ TCGA.H6.8124.01A: num  12.17 1 11.18 10.4 8.46 ...
 $ TCGA.IB.7654.01A: num  10.75 5.09 10.73 10.06 7.8 ...
 $ TCGA.3A.A9IJ.01A: num  8.63 0 10.02 9.27 6.19 ...
 $ TCGA.HZ.A8P0.01A: num  10.13 0 8.44 9.69 7.31 ...
 $ TCGA.FB.AAQ6.01A: num  9.74 0 8.78 9.44 7.22 ...
 $ TCGA.IB.A7LX.01A: num  11.69 0 10.75 10.41 9.04 ...
 $ TCGA.2J.AABK.01A: num  11.14 1.58 10.41 10.48 8.52 ...
 $ TCGA.HZ.A9TJ.06A: num  10.69 0 10.82 10.38 8.65 ...
 $ TCGA.S4.A8RP.01A: num  10.8 1 10.06 10.28 7.92 ...
 $ TCGA.2L.AAQM.01A: num  8.23 1.58 9.98 8.48 6.52 ...
 $ TCGA.HZ.A77O.01A: num  10.63 1 10.48 9.32 8.06 ...
 $ TCGA.XD.AAUG.01A: num  10.14 5.09 9.71 8.85 6.74 ...
 $ TCGA.M8.A5N4.01A: num  10.34 1 10.14 9.32 7.68 ...
 $ TCGA.2L.AAQI.01A: num  11.22 0 10.42 9.58 8.34 ...
 $ TCGA.HZ.7919.01A: num  10.8 0 10.91 10.19 8.47 ...
 $ TCGA.HZ.8005.01A: num  10.6 0 11.13 8.84 7.94 ...
 $ TCGA.IB.AAUM.01A: num  10.57 2.81 9.78 9.65 7.34 ...
 $ TCGA.IB.7888.01A: num  10.08 0 10.13 9.72 7.37 ...
 $ TCGA.2J.AAB9.01A: num  9.98 0 9.43 8.45 6.58 ...
 $ TCGA.2L.AAQJ.01A: num  11.11 1 10.41 9.63 7.83 ...
 $ TCGA.IB.AAUU.01A: num  10.37 0 10.58 9.74 8.06 ...
 $ TCGA.Q3.A5QY.01A: num  9.83 7.7 10.24 9.32 7.76 ...
 $ TCGA.FB.AAQ1.01A: num  10.94 0 10.4 9.69 8.21 ...
 $ TCGA.HV.A5A3.01A: num  10.42 0 10.17 9.81 7.95 ...
 $ TCGA.IB.AAUO.01A: num  10.15 0 10.55 9.43 8.19 ...
 $ TCGA.IB.8127.01A: num  12.3 1.58 11.7 10.84 9 ...
 $ TCGA.FB.A545.01A: num  9.79 0 9.89 8.84 7.77 ...
 $ TCGA.YH.A8SY.01A: num  10.32 0 10.31 8.85 7.83 ...
 $ TCGA.IB.A5SQ.01A: num  11.14 3.17 10.1 9.13 8.29 ...
 $ TCGA.HV.A7OP.01A: num  9.87 0 8.65 9.02 6.44 ...
 $ TCGA.IB.7644.01A: num  11 2 10.82 10.7 8.41 ...
 $ TCGA.FB.AAQ2.01A: num  11.2 2.32 10.9 9.17 8.82 ...
 $ TCGA.US.A77J.01A: num  10.04 1 10.02 9.22 7.46 ...
 $ TCGA.F2.7273.01A: num  11.6 2 10.9 10.5 8.2 ...
 $ TCGA.3A.A9I9.01A: num  10.75 3.58 10.42 9.19 7.45 ...
 $ TCGA.US.A77G.01A: num  10.23 1.58 10.35 10.08 8.09 ...
 $ TCGA.IB.7893.01A: num  11.29 0 10.87 9.85 9.14 ...
 $ TCGA.S4.A8RM.01A: num  10.57 2 10.59 11.05 8.05 ...
 $ TCGA.IB.AAUW.01A: num  11.27 3.81 10.64 10.39 7.55 ...
 $ TCGA.FB.AAPZ.01A: num  10.49 5.73 10.24 8.91 7 ...
 $ TCGA.IB.AAUS.01A: num  10.19 1.58 10.3 9.14 7.54 ...
 $ TCGA.IB.AAUT.01A: num  10.78 1 9.8 9.34 7.29 ...
 $ TCGA.3A.A9IS.01A: num  5.32 2.32 10.43 9.81 6.88 ...
 $ TCGA.L1.A7W4.01A: num  12.35 1 11.61 9.53 9.46 ...
 $ TCGA.FB.A7DR.01A: num  10.29 1.58 10.57 9.49 7.86 ...
 $ TCGA.2J.AABO.01A: num  9.69 0 9.7 8.84 6.79 ...
 $ TCGA.3A.A9IH.01A: num  11.17 1.58 10.89 9.62 8.19 ...
 $ TCGA.PZ.A5RE.01A: num  10.4 2 10.7 9.3 7.4 ...
 $ TCGA.HZ.8003.01A: num  10.57 2 10.4 9.69 8.08 ...
 $ TCGA.3A.A9IB.01A: num  11.48 1.58 10.56 9.3 8.17 ...
 $ TCGA.H6.A45N.01A: num  10.34 2 9.5 9.19 7.43 ...
 $ TCGA.HZ.8001.01A: num  11.33 4.95 10.94 9.58 8.06 ...
 $ TCGA.IB.A7M4.01A: num  10.98 0 11.22 9.16 7.87 ...
 $ TCGA.2J.AAB8.01A: num  9.42 2.58 9.53 9.4 7.67 ...
 $ TCGA.2J.AABP.01A: num  10.86 0 11.36 8.51 8.72 ...
 $ TCGA.HZ.A77Q.01A: num  10.3 6.97 10.35 8.77 7.53 ...
 $ TCGA.HZ.7926.01A: num  11.97 2.81 10.94 11.02 8.58 ...
 $ TCGA.XN.A8T5.01A: num  10.3 1.58 10.23 9 7.16 ...
 $ TCGA.IB.A5ST.01A: num  10.77 6.67 10.56 9.89 8.42 ...
 $ TCGA.Z5.AAPL.01A: num  9.39 6.02 10.6 9.58 8.13 ...
 $ TCGA.IB.7652.01A: num  11.28 1.58 11.22 10.33 8.58 ...
 $ TCGA.IB.7886.01A: num  11.94 1 11.31 10.41 8.48 ...
 $ TCGA.FB.AAQ0.01A: num  10.04 0 10.41 9.94 7.73 ...
 $ TCGA.2L.AAQE.01A: num  10.97 1.58 10.63 9.44 8.05 ...
 $ TCGA.F2.6879.01A: num  10.8 1.58 10.92 10.43 8.5 ...
 $ TCGA.HV.A5A4.01A: num  10.43 2.32 9.98 10.32 8.01 ...
 $ TCGA.US.A779.01A: num  10.66 0 10.2 9.46 7.55 ...
 $ TCGA.FB.AAPP.01A: num  10.22 1.58 10.46 9.52 7.51 ...
 $ TCGA.HV.A5A6.01A: num  11.6 2.32 10.58 9.51 7.79 ...
 $ TCGA.HZ.8636.01A: num  11.61 3.17 10.99 11.1 8.94 ...
 $ TCGA.FB.A78T.01A: num  10.6 1 10.4 10.3 8.1 ...
 $ TCGA.IB.7646.01A: num  11.78 0 11.27 9.32 8.33 ...
 $ TCGA.RL.AAAS.01A: num  11.29 2.58 10.21 9.68 7.33 ...
 $ TCGA.US.A77E.01A: num  11.17 0 11.05 9.74 7.45 ...
 $ TCGA.IB.7887.01A: num  11.31 0 11.12 10.64 8.41 ...
 $ TCGA.IB.AAUP.01A: num  10.37 2.32 10.44 9.75 8.48 ...
 $ TCGA.IB.7885.01A: num  11.19 4.46 10.82 9.78 8.6 ...
 $ TCGA.Q3.AA2A.01A: num  10.51 4.09 9.73 9.34 7.17 ...
 $ TCGA.IB.7889.01A: num  11.54 2 10.87 10.14 8.15 ...
 $ TCGA.F2.A8YN.01A: num  11.49 0 10.33 9.3 7.48 ...
 $ TCGA.H6.8124.11A: num  11.87 2.32 10.58 10.19 8.07 ...
 $ TCGA.3E.AAAZ.01A: num  10.25 2.32 10.91 10.28 8.17 ...
 $ TCGA.HZ.8637.01A: num  12.43 5.64 11.63 11.16 9.72 ...
 $ TCGA.HZ.A4BH.01A: num  11.11 5.32 10.62 10.03 8.79 ...
 $ TCGA.IB.7890.01A: num  10.4 0 10.69 9.27 7.77 ...
 $ TCGA.XD.AAUL.01A: num  11.26 1 10.48 9.46 7.9 ...
 $ TCGA.HZ.7923.01A: num  11.12 7.83 10.59 10.58 8.38 ...
 $ TCGA.2J.AABE.01A: num  10.58 1 9.68 10.43 7.25 ...
 $ TCGA.FB.AAPU.01A: num  11.11 1.58 10.75 10.69 9.7 ...
 $ TCGA.3A.A9IC.01A: num  10.55 0 10.11 9.19 8.09 ...
  [list output truncated]
>                               str(colData)
Formal class 'standardGeneric' [package "methods"] with 8 slots
  ..@ .Data     :function (x, ...)  
  ..@ generic   : chr "colData"
  .. ..- attr(*, "package")= chr "SummarizedExperiment"
  ..@ package   : chr "SummarizedExperiment"
  ..@ group     : list()
  ..@ valueClass: chr(0) 
  ..@ signature : chr "x"
  ..@ default   : NULL
  ..@ skeleton  : language (function (x, ...)  stop("invalid call in method dispatch to 'colData' (no default method)", domain = NA))(x, ...)
>

ADD REPLY • link updated 3.2 years ago by Kevin Blighe 88k • written 3.2 years ago by ruby4bioinfo • 0

0

Entering edit mode

Why don't you simply download the FPKMs from the TCGA website?

ADD REPLY • link 3.2 years ago by pinheirofabiano ▴ 100

2

Entering edit mode

Because FPKMs are terrible for between-samples normalization; furthermore, the GDC website's pipeline is a bit dated (xena uses kallisto and STAR+RSEM which are better options).

For direct download of normalized counts, xena also provides that: https://xenabrowser.net/datapages/?dataset=TCGA-GTEx-TARGET-gene-exp-counts.deseq2-normalized.log2&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443

ADD REPLY • link 3.2 years ago by dsull ★ 6.9k

score 0 · Answer 1 · 2021-09-22

0

Entering edit mode

3.2 years ago

ruby4bioinfo • 0

thank you for your valuable suggestions :)

ADD COMMENT • link 3.2 years ago by ruby4bioinfo • 0