Extract header from fasta file
2
0
Entering edit mode
2.4 years ago
Princy ▴ 60

Hello, How can I extract the id from the Orffinder fasta file result?

>lcl|ORF2_TRINITY_DN74698_c0_g1_i1:302:0 unnamed protein product, partial
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVRKKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAPRIS

I need to extract the id like this, Kindly let me know.

>TRINITY_DN74698_c0_g1_i1
header fasta • 739 views
ADD COMMENT
3
Entering edit mode
2.4 years ago

A seqkit answer. You may need to tweak the regex depending on the variability in the naming scheme.

seqkit replace -p ".*ORF\d+_([^:]+).*" -r "\$1" test.fasta

>TRINITY_DN74698_c0_g1_i1
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVR
KKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAPRIS
ADD COMMENT
2
Entering edit mode
2.4 years ago
$ awk -F "_|:" -v OFS="_" '/^>/{print ">"$2,$3,$4,$5,$6};!/>/' test.fa

>TRINITY_DN74698_c0_g1_i1
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVRKKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAP


$ sed -r '/^>/ s/.*ORF2_/>/;s/:.*//' test.fa

>TRINITY_DN74698_c0_g1_i1
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVRKKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAP
ADD COMMENT

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6