So I have a simple task, but I cannot find the specific method on BioPython documentation. I have a genbank of a bacteria. I have a list numbers, which are locations of specific SNPs on the genome:
234
1135
51515
63475
...etc
I want to know what gene each of these addresses correspond to. What series of methods do I use?
As a clear example, the final output should look like this:
234 : Is within Dehydroxylate III
1135 : Is within Dehydroxylate III
51515 : Is within Collagen Repeat factor alpha
63475 : is within DNA Polymerase III
94818 : is within DNA Helicase subunit 1
etc...
You know the name of your bacteria, am I right?
See the following post, find my answer to that bacterial question.
C: where can I get environmental bacteria genome in fasta format (as many as possib
Current version of NCBI provides a file with urls for all bacteria in the database.
Find your bacterium name, press on it, copy this url and apply in some browser.
You will easily find all files available in NCBI for your bacterium.
faa-files have a list of all bacterial genes, and they provide coordinates of these genes
ffn-files have the same gene order but no coordinates, unfortunately.
What is nice - these files are collinear. Most bacteria have a single chromosome.
First, you find your gene in faa-file (they have their coordinates in their header).
So this is a simple search in protein fasta-file headers,
If .fna file is provided, you can map the corresponding pretranslated gene (.ffn)
having needed coordinate from your list to the chromosome *.fna-file.
I would try to do something like that. It doesn't look easy,
so I hope some biopython option does exist.
No , this is for custom genbanks.
I need an output file that will look like this
There were the following posts in Biostars, I hope they may help you.
A: Identify Snps For Bacteria - Annotation
A: Variant Discovery In Bacteria
SOLVED***EDIT*** BioPython: how to edit a fasta file to include a large deletion
Get gene names from rs SNP ids