Question

my file do not have duplicates, but it still shows duplicate are not allow

0

Entering edit mode

6.4 years ago

mikysyc2016 ▴ 120

Hi all, I check my file use which(dupilcate(file)), and i already remove duplicate with my file. But when i read my file in R, it still show as below:

 x <- read.delim("merged_6_rd.txt", row.names = 1, stringsAsFactors = FALSE)
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  duplicate 'row.names' are not allowed

I do not know how to deal with it. Thanks,

rna-seq R • 2.5k views

ADD COMMENT • link updated 6.4 years ago by Biostar 20 • written 6.4 years ago by mikysyc2016 ▴ 120

1

Entering edit mode

Assuming that you are on *nix/macOS, run following command and let us know the output:

~ $ cut -f1 merged_6_rd.txt | uniq -d

Please add sort as mentioned in Pierre post, if entries in column 1 are not sorted. If they are already sorted, you don't have to sort.

ADD REPLY • link 6.4 years ago by cpad0112 21k

1

Entering edit mode

uniq needs a sorted input;

 cut -f1 merged_6_rd.txt | sort | uniq -d

ADD REPLY • link 6.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Why not show counting the first column?

ADD REPLY • link 6.4 years ago by shenwei356 8.7k

0

Entering edit mode

when i use

cut -f1 merged_6_rd.txt | sort | uniq -d

I get :

ID
NM_001001130
NM_001001144
NM_001001152
NM_001001160
NM_001001176
NM_001001177
NM_001001178
NM_001001180
NM_001001181
NM_001001182
NM_001001183
NM_00100118
..........

ADD REPLY • link updated 6.4 years ago by Ram 44k • written 6.4 years ago by mikysyc2016 ▴ 120

0

Entering edit mode

those are duplicated entries in your data. now do grep -i -w 'NM_001001130' merged_6_rd.txt. you should get more than one row and in first column of the resultant rows, you should see duplicate entries of NM_001001130'

ADD REPLY • link 6.4 years ago by cpad0112 21k

0

Entering edit mode

you are right i get two :

NM_001001130    22  16  14  12  25  18  2218
NM_001001130

how i can remove the second one? Thanks!

ADD REPLY • link updated 6.4 years ago by Ram 44k • written 6.4 years ago by mikysyc2016 ▴ 120

1

Entering edit mode

well, you need to look at the other duplicate entries and see if it is the same pattern. Then one can write a script to remove empty entries. Otherwise, you need to come up with a way to handle such entries. Make a list of duplicate entries in a separate file.

If it is same pattern, see if following code works: $ awk '!a[$1]++' merged_6_rd.txt. Please validate the output for previously identified duplicates. This is on the assumption that empty lines come second when there are duplicates. If not so, try : $ awk '$2!=""' merged_6_rd.txt. This is on the assumption that duplicate lines to be removed have 2nd column empty.

ADD REPLY • link 6.4 years ago by cpad0112 21k

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLY • link 6.4 years ago by Ram 44k

0

Entering edit mode

You had good pointers on how to remove rows with duplicate names, but I feel you should investigate why you have rows with duplicate names: generally, analysis pipelines output results with unique identifiers. How was this file created?

ADD REPLY • link 6.4 years ago by h.mon 35k