Hi
How I can obtain unique reads based on two different column ? Thanks
input:
A, 1
A, 2
A, 2
B, 1
B, 2
B, 1
C, 1
C, 3
C, 3
output:
A, 1
A, 2
B, 2
B, 1
C, 1
C ,3
Hi
How I can obtain unique reads based on two different column ? Thanks
input:
A, 1
A, 2
A, 2
B, 1
B, 2
B, 1
C, 1
C, 3
C, 3
output:
A, 1
A, 2
B, 2
B, 1
C, 1
C ,3
Use sort and uniq commands:
sort *myfile* | uniq > output
Here is a way that gets around some issues with other approaches:
$ awk '!a[$0]++' input.txt > output.txt
Here's what output would look like, from your example:
$ cat output.txt
A, 1
A, 2
B, 1
B, 2
C, 1
C, 3
If your input looks like something else, then this approach would need modifications.
In that case, use the following modification:
$ awk -v FS=',' '!a[$2$4]++' input.txt > output.txt
This will report the first line seen for the combination of the 2nd and 4th columns. Second and subsequent "hits" are not reported.
If you want to instead use sort
, you will need to use some additional options:
$ sort -u -k2,2 -k4,4 -t, input.txt > output.txt
Without reading the man pages, I'm unsure if sort
is stable, so you might get a different answer on repeated runs.
In addition to flexibility on the keys used for filtering, the awk
approach runs much faster on very large input (at the expense of memory), so if you're working with whole-genome scale input, then you may want to use awk
, instead of sort | uniq
or sort -u
-based approaches.
Hi Alex, could you please help me about this post ? compare two text file
Based on the example in one of your comments, you can do it with this:
awk -F',' '!seen[$2,$4]++' your_file.txt
You might have problems later on when there are multiple lines with the same 2nd and 4th columns but different values in some other columns. However, as you didn't mention it, above awk will work fine.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your code , but I want obtain unique reads according two different column in my input file , please check my example
The code above generates results you ask for. If that example data does not represent real data then you need to provide an appropriate example.
if the columns are not at the beginning of the table, you can extract the columns using cut: