Entering edit mode
12 months ago
pablo
▴
310
Hi,
I have this test.txt
file :
gene 1:362273700-362275735
exon 1:362275166-362275246
exon 1:362274811-362275058
exon 1:362274230-362274685
gene 1:362279796-362287281
exon 1:362279796-362280179
exon 1:362280576-362280662
exon 1:362280858-362280958
exon 1:362281056-362281106
I need to get this output :
gene-1 1:362275166-362275246
gene-1 1:362274811-362275058
gene-1 1:362274230-362274685
gene-2 1:362279796-362280179
gene-2 1:362280576-362280662
gene-2 1:362280858-362280958
gene-2 1:362281056-362281106
-> Actually, I need to remove the "gene" lines, and replace each "exon" lines with "gene-X" (where X starts by 1).
I struggle with that.
awk '$1~/exon/ {print $0 (/^exon/ ? "-" (++c) : "")}' test.txt
exon 1:362275166-362275246-1
exon 1:362274811-362275058-2
exon 1:362274230-362274685-3
exon 1:362279796-362280179-4
exon 1:362280576-362280662-5
exon 1:362280858-362280958-6
exon 1:362281056-362281106-7
awk '$1~/exon/ {$1=$1 "-" (++count[$1])}1' test.txt
gene 1:362273700-362275735
exon-1 1:362275166-362275246
exon-2 1:362274811-362275058
exon-3 1:362274230-362274685
gene 1:362279796-362287281
exon-4 1:362279796-362280179
exon-5 1:362280576-362280662
exon-6 1:362280858-362280958
exon-7 1:362281056-362281106
Can you try it?
Review all your previous questions. validate or comment the answers please.
Any chance you can use easier to write and comprehend language like Python?