I have a file like this:
chr snpid position alleleA alleleB score
chr1 id1 1720593 A T 2
chr4 id2 1795915 T A 2
chr9 id3 2405331 C G 0
chr10 id4 2791133 G C 1
chr11 id5 2849732 A C 2
chr13 id6 3220462 C A 4
chr19 id8 3278835 T A 3
chr19 id8 3278835 T A 4
I would like to first find the duplicates rows based on column2. I managed (I think) to do this using:
awk '{if (x[$2]) { y[$2]++; print $0; if (y[$2] == 1) { print x[$2] } } x[$2] = $0}' filename.txt > duplicates.txt
This gives me:
chr19 id8 3278835 T A 3
chr19 id8 3278835 T A 4
However, when I get the duplicates (and there will be multiple pairs in my actual file), for each pair, I want to find the row with the lowest value in column 6. This means my final output for this example would be a file with the row:
chr14 id8 3278835 T A 3
Is there any way I can easily combine both steps? Should I be using a loop? Any help would be deeply appreciated.