Using Tblastx On Denovo Assembly
2
1
Entering edit mode
11.8 years ago
Honey ▴ 200

I am working on a study to detect bacterial and viral particles (organisms) in patient samples. After reading various papers and various posts on this forum; I perform de novo assembly and have contigs. I have read in various papers (e.g http://www.plosone.org/article/info:doi/10.1371/journal.pone.0034631 ; http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3196430/) that after assembly I have to use tBlastx, they are using the command line version to perform such blast. They also mention that a database for viral genome is created. I found on the web http://nebc.nerc.ac.uk/bioinformatics/docs/tblastx.html but not sure if this is one to follow or i am chasing a wrong target. I would appreciate if some one can direct me to a URL or advice how the database is generated to use tblastx (as mentioned in almost all publications showing detection of pathogen in samples). Is there a work flow which I can follow. Why cannot we use tblastX on NCBI site? How we finally create the database and perform tblastX for such studies.

Thanks

denovo assembly • 3.0k views
ADD COMMENT
2
Entering edit mode
11.8 years ago

You could use NCBI's so long as your viruses of interest were in nr, but if you are going to submit huge numbers of contigs, I think it's better to do the job on your own mainframes, than to use NCBI.

If you have blast on your own system, you can easily make your own blast databases from multi-fastas with a single command line.

ADD COMMENT
0
Entering edit mode

I know I may look like stupid. that is precisely my question is where 1.How can I download local blat? 2. How can I find the work flow/ commands to generate database on my machine.

Thanks

ADD REPLY
2
Entering edit mode

You could download NCBI Blast+ on NCBI's FTP site here, download and install, and the manual at here

ADD REPLY
1
Entering edit mode
11.8 years ago
lelle ▴ 830

If you use a database only containing virus sequences you will have a massive database bias. That is, you will get alignments with viruses for reads that actually originate from another genome and these will be the best hits because the actual species the reads come from is not in your database. So if you you have the computer power to do this I would strongly recommend you run your contigs against NCBI. If not, it might be a good idea to at least go back and run the contigs that you have identified as virus contigs against the NCBI and check if they get a better hit to another species.

ADD COMMENT
1
Entering edit mode

I agree with lelle. At a minimum include sequences which may make their way into your samples. Patients = human DNA (genome) plus transcriptome. Bacterial genomes are small, so instead of picking by hand these which may be present in clinical samples, simply use all NCBI genomes. For a good measure include other organisms like Candida, Toxoplasma etc.

ADD REPLY
0
Entering edit mode

That means lelle you are saying that I take contig filter with some parameter like number of bases and k value and use blast nr? Am I understood correctly. I agree with your point with local databse I will have lot of bias. But most of publications have done in this way especially for bacterial and viral genomes.

ADD REPLY

Login before adding your answer.

Traffic: 2910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6