Entering edit mode
8.1 years ago
zizigolu
★
4.3k
hi,
I have a list of IDs and I want to extract their expression profile of my normalized file but I get error
mycounts <- read.table("NormData.txt", header = T, sep = "\t")
rownames(mycounts) <- mycounts[ , 1]
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘0610007L01Rik’, ‘0610007P08Rik’, ‘0610008F07Rik’, ‘0610010F05Rik’, ‘0610010K06Rik’, ‘0610010K14Rik’, ‘0610011L14Rik’, ‘0610030E20Rik’, ‘1-Mar’, ‘1-Sep’, ‘10-Mar’, ‘11-Mar’, ‘11-Sep’, ‘1100001G20Rik’, ‘1110002E22Rik’, ‘1110003E01Rik’, ‘1110006E14Rik’, ‘1110007A13Rik’, ‘1110008L16Rik’, ‘1110017D15Rik’, ‘1110021J02Rik’, ‘1110028C15Rik’, ‘1110034B05Rik’, ‘1110034G24Rik’, ‘1110037F02Rik’, ‘1110049F12Rik’, ‘1110051M20Rik’, ‘1110057K04Rik’, ‘1110059G10Rik’, ‘1190002A17Rik’, ‘1190002N15Rik’, ‘1190003J15Rik’, ‘1190007F08Rik’, ‘12-Sep’, ‘1200014J11Rik’, ‘1300001I01Rik’, ‘1300010F03Rik’, ‘14-Sep’, ‘1500002O20Rik’, ‘1500003O03Rik’, ‘1500011B03Rik’, ‘1500011K16Rik’, ‘1500012K07Rik’, ‘1600012F09Rik’, ‘1600014C10Rik’, ‘1600029D21Rik’, ‘1700001L05Rik’, ‘1700003M02Rik’, ‘1700007K09Rik’, ‘1700008A04Rik’, ‘1700008J07Rik’, ‘1700008O03Rik’, ‘1700008P20Rik’, ‘1700009P17Rik’, ‘1700010I14Rik’, ‘1700011F14Rik’, ‘1700011H22Rik’, ‘1700012B07Rik’, ‘1700013N18Rik’, ‘170 [... truncated]
I tried
names <- read.table("names.txt", header = T, sep = "\t") my row name file
names <- c(names)
df = data.frame(as.matrix(mycounts))
rownames(df) = make.names(names, unique=TRUE)
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
what to do please?
thank you
merc Noushin jan :) :) :)
@Angel: How did you get duplicates in the first place and did they have identical values?
thank you
you know @ genomax2
I normalized Agilent data GSE50833 by below tutorial
http://matticklab.com/index.php?title=Single_channel_analysis_of_Agilent_microarray_data_with_Limma
output file after removing columns like Start Sequence ProbeUID ControlType ProbeName GeneName SystematicName Description contained like so
substanceBXH F2_2 F2_3 F2_14 F2_15 F2_19 F2_20 F2_23 F2_24 F2_26 F2_37 F2_42 F2_43
A_30_P01033363 4.920370044 5.128868456 5.088803534 4.327204286 5.420323311 4.832380887 4.172456375 4.599314468 4.804687463 4.758797644 5.421726358 5.159465474
A_55_P1965358 6.673461411 6.559541943 6.691603173 6.84222391 7.057431615 6.728350624 6.625561924 6.503003246 6.342636712 6.480291151 6.830651816 7.000356243
A_66_P122433 3.925835915 3.671287045 4.756575578 3.827644007 4.706803712 3.207884127 3.447130951 3.852825598 2.499938067 3.543076474 4.525543409 4.068809295
the values were not duplicated and only the rownames duplicated, when I was going to extract 2000 DEGs I got an error about duplication. I could not solve the error via R therefore I removed duplicated rows by excel ignoring whether there are deferentially expressed or not :(
That sounds odd. So you got duplicate rows (identical gene/probe names) with different values after normalization?
yes, I only removed unnecessary columns and kept genename column but when I checked in excel I noticed many duplicated row names
https://i.imgsafe.org/12487f4206.png
It is hard to see but which name are you referring to (GeneName or SystematicName)? It does not look like you have duplicate rows from that image. Should you not be using one of these names rather than the probeID?
thank you,
I used GeneName. I also checked SystematicName which was the most duplicated. I magnified the excel file. probID is something like sequence
https://i.imgsafe.org/127caa47f3.png
That is not correct. The
A_*
are the Agilent probe ID's. GeneNames were in the next column. It looks like you did not parse/import the file correctly (or the header in your file is messed up). Please go back and correct.I agree that it is good practice to look into why you have duplicate values in the column in the first place. Are they technical duplicates, such as multiple probes with the same sequence repeated across several positions on the array for QC, or otherwise? or are they generated after some processing? possibly a
merge
call?thank you Noushin hamvatan :)
I only normalized GSE50833 with this tutorial http://matticklab.com/index.php?title=Single_channel_analysis_of_Agilent_microarray_data_with_Limma
and I found many duplication in each column :)