Entering edit mode
6.4 years ago
mikysyc2016
▴
120
Hi all, I check my file use which(dupilcate(file)), and i already remove duplicate with my file. But when i read my file in R, it still show as below:
x <- read.delim("merged_6_rd.txt", row.names = 1, stringsAsFactors = FALSE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
I do not know how to deal with it. Thanks,
Assuming that you are on *nix/macOS, run following command and let us know the output:
Please add
sort
as mentioned in Pierre post, if entries in column 1 are not sorted. If they are already sorted, you don't have to sort.uniq needs a sorted input;
Why not show counting the first column?
when i use
I get :
those are duplicated entries in your data. now do
grep -i -w 'NM_001001130' merged_6_rd.txt
. you should get more than one row and in first column of the resultant rows, you should see duplicate entries of NM_001001130'you are right i get two :
how i can remove the second one? Thanks!
well, you need to look at the other duplicate entries and see if it is the same pattern. Then one can write a script to remove empty entries. Otherwise, you need to come up with a way to handle such entries. Make a list of duplicate entries in a separate file.
If it is same pattern, see if following code works:
$ awk '!a[$1]++' merged_6_rd.txt
. Please validate the output for previously identified duplicates. This is on the assumption that empty lines come second when there are duplicates. If not so, try :$ awk '$2!=""' merged_6_rd.txt
. This is on the assumption that duplicate lines to be removed have 2nd column empty.Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.You had good pointers on how to remove rows with duplicate names, but I feel you should investigate why you have rows with duplicate names: generally, analysis pipelines output results with unique identifiers. How was this file created?