Conversion of fasta file to taxon id(?)
0
0
Entering edit mode
4.7 years ago

I have a fasta file of nitrogen fixing bacteria. Gene nifH. The file has id as well as sequence. I want to form a table that will define I'd for example ( id: kinddom, phylum, class, order, family, genus, specie). This is my first time doing this as I am still a bachelor student so any help will be appreciated.

sequence BLAST taxon id Ncbi Taxonomy • 1.9k views
ADD COMMENT
0
Entering edit mode

Do you have the taxID of these bacteria or just sequence of the genes? Do you have names of the bacteria you are dealing with?

If you have the taxID's you can get the information you need by using my answer here: A: converting taxID to taxonomy

ADD REPLY
0
Entering edit mode

I have sequences of genes.

ADD REPLY
0
Entering edit mode

Does the header of the sequences contains Accession numbers of the genes?

How many sequences do you have?

ADD REPLY
0
Entering edit mode

No accession number and the number of sequences are more than 100. I don't really know because the file was forwarded to me

ADD REPLY
0
Entering edit mode

There might be tools out there that can give you what you need just using the sequences, but it's not on top of my head.

I'm not sure but I think you can use blastn and find the accession numbers or reference genomes, if you have under 500 sequence s you might be able to use online blastn. Try with 2 of the sequences and see if you like the output. You can also do blastn command line and use - remote option to search against nucleotide database (nr), but -remote option is slow. You can also download nt database and compile it, but it's huge.

You can also use blastx against non redundant proteins (nr) and get the protein accession numbers.

If you had protein sequences instead of gene sequences you could use blastp.

After you got the accession numbers then you can use efetch from Entrez Direct to get the taxonomy.

If you're not familiar with these tools you can google them or search them in this forum.

If you click on this link https://www.ncbi.nlm.nih.gov/nuccore/X51500.1?report=fasta

on the right you can see Run Blast, if you click on it, then you can choose blastn or blastx and compare their output.

Also you can access online blast from here:

https://blast.ncbi.nlm.nih.gov/Blast.cgi

You can select blastx or blastn and copy paste two of your sequences, upload your fasta file, ...

ADD REPLY
0
Entering edit mode

thank you ill try doind as you say

ADD REPLY
0
Entering edit mode

Please post an example of some of the data you have.

ADD REPLY
0
Entering edit mode

If you know that you have nifH genes it should be easy to identify which bacteria they are from by downloading all nifH genes from NCBI and then doing a search against that set using blat or blast.

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6