Entering edit mode
8.2 years ago
Am.A
▴
20
Hi all
How I parse FASTA file to get information about gene location ( i.e. get numbers start of gene and the end)?
>lcl|NC_000913.3_cds_NP_414542.1_1 [gene=thrL] [protein=thr operon leader peptide] [protein_id=NP_414542.1] [location=190..255]
ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA
>lcl|NC_000913.3_cds_NP_414547.1_6 [gene=yaaA] [protein=peroxide resistance protein, lowers intracellular iron] [protein_id=NP_414547.1] [location=complement(5683..6459)]
ATGCTGATTCTTATTTCACCTGCGAAAACGCTTGATTACCAAAGCCCGTTGACCACCACGCGCTATACGC
TGCCGGAGCTGTTAGACAATTCCCAGCAGTTGATCCATGAGGCGCGGAAACTGACGCCTCCGCAGATTAG
you can find exactly what you need in previous question
Correct Way To Parse A Fasta File In Python
bonus
read this
https://github.com/mdshw5/pyfaidx
Okay, you have my permission to do so.
But what is the question? Have you tried googling?
But you don't have my permission to give OP permission :-)
Unless OP edited the question after you wrote your comment it does appear to have a reasonably clear description. On a serious note, can we have more of what @Medhat did and less of these comments?
Indeed, the post was edited and didn't contain a question at all when I placed my comment asking about what the question would be. I realize that my answer (with the edited original post) makes me look like a douche.
@Am.a: It generally helps to be explicit about the output you want when you write the original post. For example in this case do you only need