awk gff column change
2
0
Entering edit mode
5.0 years ago
rob234king ▴ 610

I have had to split by fasta genome into chunks to speed up annotation using maker as we did not have mpi up and chromosomes were very large. I need to add the last value in the string of the first column to columns 4 and 5 to correct the starting and finish positions. I need to do this to multiple files so ideally parse the value rather than manually enter the value but even that would be helpful. I think this should be quite easy awk but still not familiar with it, any ideas?

Examples line:
Chr13_pilon_pilon_8000000       maker   gene    1257    1527    .       +       .       ID=maker-Chr13_pilon_pilon_8000000-exonerate_est2genome-gene-0.0;Name=evgtrinLocGG_7725c278g1t1-gene

Examples output:
Chr13_pilon_pilon_8000000       maker   gene    8001257    8001527    .       +       .       ID=maker-Chr13_pilon_pilon_8000000-exonerate_est2genome-gene-0.0;Name=evgtrinLocGG_7725c278g1t1-gene
gff awk • 1.8k views
ADD COMMENT
1
Entering edit mode
5.0 years ago

if the value you need to add to columns 4 and 5 is always behind a "_" character in the first column, then this perl code should work:

perl -lane 'if (/^\S+_(\d+)/) { $F[3] += $1; $F[4] += $1; print join "\t", @F}' input.txt
ADD COMMENT
1
Entering edit mode
5.0 years ago
Jianyu ▴ 580
awk '{n=split($1,a,"_"); print $1,$2,$3,a[n]+$4, a[n]+$5,$6,$7,$8,$9}' input.txt
ADD COMMENT
2
Entering edit mode

not caring about the number of total columns:

awk '{n=split($1,a,"_"); $4+=a[n]; $5+=a[n]; print}' input.txt

not caring about the number of total columns, plus considering that both input and output are tabulated:

awk 'FS=OFS="\t"{n=split($1,a,"_"); $4+=a[n]; $5+=a[n]; print}' input.txt
ADD REPLY

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6