Using awk or sed to edit a column in a gtf file
1
1
Entering edit mode
5.5 years ago
imda ▴ 10

Dear all,

I have a gft file and I want to remove a cenrtaind word in a certain column:

In this case I want to remove in the second column "mRNA." and just to keep CA01g00010 in this column, could you help me with this?

Pepper1.55ch01  mRNA.CA01g00010 63209   63880

I would like this output

Pepper1.55ch01   CA01g00010 63209   63880

Best

awk sed gft linux • 2.9k views
ADD COMMENT
2
Entering edit mode

In the case provided it can also be: sed "s/\tmRNA\./\t/g" example.gtf.

ADD REPLY
4
Entering edit mode
5.5 years ago
cschu181 ★ 2.8k
awk -v OFS="\t" -v FS="\t" '{ $2=gensub("^mRNA.", "", 0, $2); print $0; }' your_file > your_modified_file

Edit: removed tab-separated assumption, as I just saw it should treat a gtf, which we all know is tab-separated.

ADD COMMENT
1
Entering edit mode

another awk solution:

$ awk -v OFS="\t" -F "\t" '{sub("^[a-zA-Z]+\.","",$2)}1' test.txt
$ awk -v OFS="\t" -F "\t" '{$2=substr($2,6)}1' test.txt
Pepper1.55ch01  CA01g00010  63209   63880

with gawk installed:

 $ awk -v OFS="\t" -F "\t" '{sub("^[a-zA-Z]{4}.","",$2)}1' test.txt
ADD REPLY
0
Entering edit mode

This worked awesome for me thank you!

ADD REPLY
0
Entering edit mode

thank you, I could remove mRNA part

ADD REPLY

Login before adding your answer.

Traffic: 2728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6