Question

awk gff column change

0

Entering edit mode

5.5 years ago

rob234king ▴ 610

I have had to split by fasta genome into chunks to speed up annotation using maker as we did not have mpi up and chromosomes were very large. I need to add the last value in the string of the first column to columns 4 and 5 to correct the starting and finish positions. I need to do this to multiple files so ideally parse the value rather than manually enter the value but even that would be helpful. I think this should be quite easy awk but still not familiar with it, any ideas?

Examples line:
Chr13_pilon_pilon_8000000       maker   gene    1257    1527    .       +       .       ID=maker-Chr13_pilon_pilon_8000000-exonerate_est2genome-gene-0.0;Name=evgtrinLocGG_7725c278g1t1-gene

Examples output:
Chr13_pilon_pilon_8000000       maker   gene    8001257    8001527    .       +       .       ID=maker-Chr13_pilon_pilon_8000000-exonerate_est2genome-gene-0.0;Name=evgtrinLocGG_7725c278g1t1-gene

gff awk • 1.9k views

ADD COMMENT • link updated 5.5 years ago by Jianyu ▴ 580 • written 5.5 years ago by rob234king ▴ 610

score 1 · Answer 1 · 2019-11-13

1

Entering edit mode

5.5 years ago

Jorge Amigo 14k

if the value you need to add to columns 4 and 5 is always behind a "_" character in the first column, then this perl code should work:

perl -lane 'if (/^\S+_(\d+)/) { $F[3] += $1; $F[4] += $1; print join "\t", @F}' input.txt

ADD COMMENT • link 5.5 years ago by Jorge Amigo 14k

score 1 · Answer 2 · 2019-11-13

1

Entering edit mode

5.5 years ago

Jianyu ▴ 580

awk '{n=split($1,a,"_"); print $1,$2,$3,a[n]+$4, a[n]+$5,$6,$7,$8,$9}' input.txt

ADD COMMENT • link 5.5 years ago by Jianyu ▴ 580

2

Entering edit mode

not caring about the number of total columns:

awk '{n=split($1,a,"_"); $4+=a[n]; $5+=a[n]; print}' input.txt

not caring about the number of total columns, plus considering that both input and output are tabulated:

awk 'FS=OFS="\t"{n=split($1,a,"_"); $4+=a[n]; $5+=a[n]; print}' input.txt

ADD REPLY • link 5.5 years ago by Jorge Amigo 14k