Blastn for all bacteria genomes; create the db
2
0
Entering edit mode
9 months ago
davidmaimoun ▴ 50

Hello, I need to run blastn on Bacteria to see if my sample really is nesseirai men, and didn't have been contaminated. I downloaded reference seq from ncbi for each bacteria. In a data folder, got many subfolders, each one contains a fasta file (.file), representing a genome.

enter image description here subfolders

enter image description here genome file

How can I build a db with this data? I tried the cmd makeblastdb for each genome but it doesn't seem to me the good way.

Thank you

blast blastn • 898 views
ADD COMMENT
2
Entering edit mode
9 months ago
GenoMax 147k

I tried the cmd makeblastdb for each genome but it doesn't seem to me the good way.

If you have made the individual databases then use blastdb_alias tool to create a common alias : https://www.ncbi.nlm.nih.gov/books/NBK569848/

Otherwise walk through the folder structure, cat the .fna files into one large fasta and then create one database.

ADD COMMENT
0
Entering edit mode

Thank you GenoMax !

After running makeblastdb have got this:

enter image description here after the cmd

in each folder (~3500 folders).

Do you advise me to use the alias? Or it's better to create a big multi-fasta and create the db based on it?

ADD REPLY
0
Entering edit mode

If all of your databases have that identical name then you will need to rename/recreate them. At that point you may as well cat the .fna and re do the blastdb. If you used a script to make the 3500 db you could modify it so the output files gain unique names. You will have to tell us if the aliastool works for 3500 databases.

ADD REPLY
0
Entering edit mode

Thank you for the answers guys. I went with the cat .fna solution and it fit well to what I have to do. Again, thank you!

ADD REPLY
0
Entering edit mode

Please accept this answer (green check mark) to provide closure for this thread.

ADD REPLY
0
Entering edit mode
9 months ago

I'd recommend using some other tools:

They can both compute "signature" from the reads and genomes and analyze the microbial composition. Note that, for querying with genomes, small thresholds should be used.

ADD COMMENT
0
Entering edit mode

Based on a different thread these are not metagenomic samples. OP is interested in checking if there is any contamination besides one expected genome (from what I gather it is not normally expected). We already discussed that none of the tools (including blast) are likely to be good for that purpose, especially if the "contamination" is from a close relative.

ADD REPLY
0
Entering edit mode

see if my sample really is nesseirai men, and didn't have been contaminated

OP might have some assemblies of supposed-to-be single-specie data, and they might be mixed with contigs from other species. So we can treat it as metagenomic data, in contigs rather than reads, where a metagenomic profiling tool might help.

We already discussed that none of the tools (including blast) are likely to be good for that purpose, especially if the "contamination" is from a close relative.

Yes, close relatives are hard to tell.

ADD REPLY

Login before adding your answer.

Traffic: 2639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6