Entering edit mode
2.4 years ago
Alex S
▴
20
I have a file that looks like this:
C.Chr1:75500000-95000000:1029180-1029225
C.Chr1:75500000-95000000:1033800-1033847
C.Chr1:75500000-95000000:1035240-1035285
C.Chr1:75500000-95000000:1035460-1035505
C.Chr2:584000000-610000000:17911000-17911047
C.Chr2:584000000-610000000:17911000-17911047
C.Chr2:584000000-610000000:17911000-17911047
C.Chr3:30000000-130000000:21437320-21437367
C.Chr3:30000000-130000000:21437380-21437425
C.Chr3:30000000-130000000:21437700-21437747
C.Chr3:30000000-130000000:21438080-21438127
I need to count how many lines are unique, not considering the repeated lines.
I've tried uniq -c | sort -bgr
but the number of lines is way smaller than expected, and I think it can be a problem in the uniq
function.
Anyone knows another code or function that would help?
I like
sort -u
followed byuniq
. (Had a situation recently whereuniq
did not work on its own, it is probably redundant here).sort -u
does not need to be followed byuniq
as it already constricts the file to its unique subset.sort -u
will result in non-redundant subset, not unique. For anything unique you will need to useuniq
(with the-u
option)yes, it's a bit semantics but it is crucial in certain circumstances.
I am trying to understand if this is a distinction without a difference, or something that can be important in practice. What would be an example on multiple lines in a file where
sort -u <file> | wc -l
andsort <file> | uniq -u | wc -l
will give a different output?In
man uniq
it says:Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'.
well,
uniq -u
only prints the unique lines in the files (== those that are only present once and no others) ; it's the opposite behaviour ofuniq -d
(== print only lines that are repeated in the input file)sort -u
makes the file non-redundant (== one representative of each repeated line is kept)of course, and indeed, this all only applies when files are correctly sorted (though running uniq on unsorted files sometimes pretty useful to get a desired result)
sort <file> | uniq
will give the exact same output assort -u <file>
(and the same assort -u file | uniq -u
for that matter , but that's just a waste of option usage :) )It works!! Thanks a lot.