Entering edit mode
3.2 years ago
mrj
▴
180
I am wondering how to extract the two numbers within the location tab of the following fasta header.
>lcl|CP033719.1_cds_AYW77996.1_1542 [locus_tag=EGX94_07890] [protein=copper oxidase] [protein_id=AYW77996.1] [location=1885267..1887939] [gbkey=CDS]
Thank you so much for this solution. It works for me. I am learning a lot from your solution.
In Python
Suppose your header is saved in
header
variableheader.partition("location=")[2].partition("]")[0].split('..')
This will return list
['1885267', '1887939']
which you can easily manipulateIt will only work if it finds a location keyword, otherwise, it will return an empty list
Hello Renesh, Thanks. This is much more similar and does the task perfectly.
Thank you so much.