Entering edit mode
3.1 years ago
rheab1230
▴
140
Hello everyone,
I am trying to perform deseq2 analysis on my genecount file to normalize it.
This is from where I got the gene count file: https://www.ebi.ac.uk/arrayexpress/files/E-GEUV-1/GD660.GeneQuantCount.txt.gz
My gene count file looks like this:
TargetID Gene_Symbol Chr Coord HG00096.1.M_111124_6 HG00097.7.M_120219_2 HG00099.1.M_120209_6 HG00099.5.M_120131_3 HG00100.2.M_111215_8 HG00101.1.M_111124_4 HG00102.3.M_120202_8 HG00103.4.M_120208_3 HG00104.1.M_111124_5 HG00105.1.M_120209_7 HG00105.3.M_120223_6 HG00106.4.M_120208_5 HG00108.7.M_120219_2 HG00109.1.M_120209_4 HG00109.3.M_120202_5 HG00110.2.M_120131_2 HG00111.1.M_120209_8 HG00111.2.M_111215_4 HG00112.6.M_120119_2 HG00114.1.M_120209_3 HG00114.6.M_120217_1 HG00115.6.M_120119_1 HG00116.2.M_120131_1 HG00117.1.M_111124_2 HG00117.1.M_120209_1 HG00117.2.M_111216_4 HG00117.3.M_120202_6 HG00117.4.M_120208_4 HG00117.5.M_120131_3 HG00117.6.M_120217_1 HG00117.7.M_120219_4 HG00118.4.M_120208_5 HG00119.1.M_120209_3 HG00119.2.M_111216_6 HG00120.3.M_120202_2 HG00121.1.M_111124_7 HG00122.6.M_120119_1 HG00123.4.M_120208_7 HG00124.3.M_120223_7
The code is:
GD_dat = read.delim("GD660.GeneQuantCount.txt",header=TRUE,row.names = NULL)
GD_dat = GD_dat[,-c(1:3)]
head(GD_dat)
dim(GD_dat)
colnames(GD_dat) = substr(colnames(GD_dat),1,7)
rownames(GD_dat) = substr(rownames(GD_dat),1,15)
geneNames<-GD_dat[,1]
rownames(GD_dat)<-geneNames
GD_dat<-GD_dat[,2:ncol(GD_dat)]
sample_info<-DataFrame(condition=names(GD_dat), row.names=names(GD_dat))
library("DESeq2")
# runs the DESeq2
ds<-DESeqDataSetFromMatrix(countData=GD_dat, colData=sample_info, design= ~condition)
keep_genes<-rowSums(counts(ds))>0
I am getting this error:
NA20814.2.M_111215_6 NA20815.5.M_120131_5 NA20816.3.M_120202_7
1 0 0 0
2 0 0 0
3 0 0 0
4 10 8 16
5 0 0 0
6 4860 6782 4952
NA20819.3.M_120202_2 NA20826.1.M_111124_1 NA20828.2.M_111216_8
1 0 2 0.000
2 0 0 0.000
3 0 0 0.000
4 6 16 8.000
5 0 0 0.000
6 1864 3446 4814.479
[1] 53934 661
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique values when setting 'row.names': '333174', '568198', '668559', '1976363', '2182439', '2637270', '2637585', '2795614', '3417146', '5115909', '7291199', '7307416', '7440175', '9212383', '9215731', '10490159', '10697357', '12203078', '12267546', '12794843', '15130775', '15489611', '16739015', '17046652', '18118499', '18507325', '18967449', '19015949', '19303400', '19612838', '19627036', '20408712', '20829598', '21180973', '22788423', '24682679', '25042238', '27401462', '27932953', '29952206', '30501206', '30893010', '31799523', '31895475', '32635667', '32806599', '34117481', '34252878', '34880704', '36871979', '37126773', '37823505', '37962056', '37979892', '38023636', '38080696', '38858438', '39240459', '39347289', '39817308', '40509629', '41754280', '42120283', '42640301', '43009842', '44245583', '45911744', '46854048', '47012325', '50101948', '50155854', '50747584', '50837249', '52009066', '53063128', '53704282', '53835525', '54379303', '54385522', '54427734', '56109820', '5 [... truncated]
Execution halted
In my case I don't know how to arrange the gene in one column and sample in another with their count values. For me its coming as one sample and its corresponding genes.
Hi,
I think you have made your rows and geneNames dataframe from the 'coord' values in the data rather than the Gene_Symbol column.