Entering edit mode
4.7 years ago
sajjalwaqas
•
0
I have a fasta file of nitrogen fixing bacteria. Gene nifH. The file has id as well as sequence. I want to form a table that will define I'd for example ( id: kinddom, phylum, class, order, family, genus, specie). This is my first time doing this as I am still a bachelor student so any help will be appreciated.
Do you have the
taxID
of these bacteria or just sequence of the genes? Do you have names of the bacteria you are dealing with?If you have the taxID's you can get the information you need by using my answer here: A: converting taxID to taxonomy
I have sequences of genes.
Does the header of the sequences contains Accession numbers of the genes?
How many sequences do you have?
No accession number and the number of sequences are more than 100. I don't really know because the file was forwarded to me
There might be tools out there that can give you what you need just using the sequences, but it's not on top of my head.
I'm not sure but I think you can use
blastn
and find the accession numbers or reference genomes, if you have under 500 sequence s you might be able to use onlineblastn
. Try with 2 of the sequences and see if you like the output. You can also doblastn
command line and use - remote option to search against nucleotide database (nr), but -remote option is slow. You can also download nt database and compile it, but it's huge.You can also use blastx against non redundant proteins (nr) and get the protein accession numbers.
If you had protein sequences instead of gene sequences you could use blastp.
After you got the accession numbers then you can use
efetch
from Entrez Direct to get the taxonomy.If you're not familiar with these tools you can google them or search them in this forum.
If you click on this link https://www.ncbi.nlm.nih.gov/nuccore/X51500.1?report=fasta
on the right you can see Run Blast, if you click on it, then you can choose
blastn
orblastx
and compare their output.Also you can access online blast from here:
https://blast.ncbi.nlm.nih.gov/Blast.cgi
You can select
blastx
orblastn
and copy paste two of your sequences, upload your fasta file, ...thank you ill try doind as you say
Please post an example of some of the data you have.
If you know that you have
nifH
genes it should be easy to identify which bacteria they are from by downloading allnifH
genes from NCBI and then doing a search against that set usingblat
orblast
.