I have a file -
gene_name chr start end
FAM138A chr1 34553 36081
FAM138A chr1 35244 36073
OR4F5 chr1 69090 70008
RP11-34P13.7 chr1 89294 120932
RP11-34P13.8 chr1 89550 91105
RP11-34P13.7 chr1 92229 129217
I want to pick out the first occurrence of each gene as it would give me the longest transcript. Any help on doing the same would be appreciated.
Thank you.
What have you tried? It's good practice to show the effort you took to solve this issue, rather than just asking us to solve it completely.
e.g. if you show a bit of Python code I could fix it for you, or show your awk code and you'll automatically summon Pierre Lindenbaum
I have been trying grep. awk, I don't understand very well, so I am keeping that as an option. python, I have no understanding of. I tried
grep --max-count=1 "FAM138A" filename
and got the desired result, but I want to know how to automate for each gene.Thanks again.
Is this thread helpful? https://unix.stackexchange.com/questions/160009/remove-entire-row-in-a-file-if-first-column-is-repeated Googled for
only keep unique rows based on column unix
That worked perfectly. Thank you very much