Question

DESeq2

0

Entering edit mode

3.7 years ago

rheab1230 ▴ 140

Hello everyone,

I am trying to perform deseq2 analysis on my genecount file to normalize it.

This is from where I got the gene count file: https://www.ebi.ac.uk/arrayexpress/files/E-GEUV-1/GD660.GeneQuantCount.txt.gz

My gene count file looks like this:

TargetID        Gene_Symbol     Chr     Coord   HG00096.1.M_111124_6 HG00097.7.M_120219_2    HG00099.1.M_120209_6    HG00099.5.M_120131_3    HG00100.2.M_111215_8    HG00101.1.M_111124_4    HG00102.3.M_120202_8    HG00103.4.M_120208_3    HG00104.1.M_111124_5    HG00105.1.M_120209_7    HG00105.3.M_120223_6    HG00106.4.M_120208_5    HG00108.7.M_120219_2    HG00109.1.M_120209_4    HG00109.3.M_120202_5    HG00110.2.M_120131_2    HG00111.1.M_120209_8    HG00111.2.M_111215_4    HG00112.6.M_120119_2    HG00114.1.M_120209_3    HG00114.6.M_120217_1    HG00115.6.M_120119_1    HG00116.2.M_120131_1    HG00117.1.M_111124_2    HG00117.1.M_120209_1    HG00117.2.M_111216_4    HG00117.3.M_120202_6    HG00117.4.M_120208_4    HG00117.5.M_120131_3    HG00117.6.M_120217_1    HG00117.7.M_120219_4    HG00118.4.M_120208_5    HG00119.1.M_120209_3    HG00119.2.M_111216_6    HG00120.3.M_120202_2    HG00121.1.M_111124_7    HG00122.6.M_120119_1    HG00123.4.M_120208_7    HG00124.3.M_120223_7

The code is:

GD_dat = read.delim("GD660.GeneQuantCount.txt",header=TRUE,row.names = NULL)
GD_dat = GD_dat[,-c(1:3)]
head(GD_dat)
dim(GD_dat)
colnames(GD_dat)  = substr(colnames(GD_dat),1,7)
rownames(GD_dat) = substr(rownames(GD_dat),1,15)
geneNames<-GD_dat[,1]
rownames(GD_dat)<-geneNames
GD_dat<-GD_dat[,2:ncol(GD_dat)]
sample_info<-DataFrame(condition=names(GD_dat), row.names=names(GD_dat))
library("DESeq2")
# runs the DESeq2
ds<-DESeqDataSetFromMatrix(countData=GD_dat, colData=sample_info, design= ~condition)
keep_genes<-rowSums(counts(ds))>0

I am getting this error:

  NA20814.2.M_111215_6 NA20815.5.M_120131_5 NA20816.3.M_120202_7
1                    0                    0                    0
2                    0                    0                    0
3                    0                    0                    0
4                   10                    8                   16
5                    0                    0                    0
6                 4860                 6782                 4952
  NA20819.3.M_120202_2 NA20826.1.M_111124_1 NA20828.2.M_111216_8
1                    0                    2                0.000
2                    0                    0                0.000
3                    0                    0                0.000
4                    6                   16                8.000
5                    0                    0                0.000
6                 1864                 3446             4814.479
[1] 53934   661
Error in `.rowNamesDF<-`(x, value = value) :
  duplicate 'row.names' are not allowed
Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique values when setting 'row.names': '333174', '568198', '668559', '1976363', '2182439', '2637270', '2637585', '2795614', '3417146', '5115909', '7291199', '7307416', '7440175', '9212383', '9215731', '10490159', '10697357', '12203078', '12267546', '12794843', '15130775', '15489611', '16739015', '17046652', '18118499', '18507325', '18967449', '19015949', '19303400', '19612838', '19627036', '20408712', '20829598', '21180973', '22788423', '24682679', '25042238', '27401462', '27932953', '29952206', '30501206', '30893010', '31799523', '31895475', '32635667', '32806599', '34117481', '34252878', '34880704', '36871979', '37126773', '37823505', '37962056', '37979892', '38023636', '38080696', '38858438', '39240459', '39347289', '39817308', '40509629', '41754280', '42120283', '42640301', '43009842', '44245583', '45911744', '46854048', '47012325', '50101948', '50155854', '50747584', '50837249', '52009066', '53063128', '53704282', '53835525', '54379303', '54385522', '54427734', '56109820', '5 [... truncated]
Execution halted

In my case I don't know how to arrange the gene in one column and sample in another with their count values. For me its coming as one sample and its corresponding genes.

RNA-seq DESEq2 • 1.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 3.7 years ago by rheab1230 ▴ 140

0

Entering edit mode

Hi,

I think you have made your rows and geneNames dataframe from the 'coord' values in the data rather than the Gene_Symbol column.

ADD REPLY • link 3.7 years ago by sgallaher03 • 0