How to construct a blast database for non-green plant species and fungi
2
1
Entering edit mode
10 months ago
Sony ▴ 10

Hi all,

I am constructing the pangenome on rice based on iterative assembly approach. In general, I extracted the unmapped reads of my sample with the Nipponbare reference. I assembled unmapped reads into novo contigs using MaSuRCA. And now, I want to detect the contamination (non-green plant species and fungi) of these newly contigs through Blast search with NCBI nt database. I have checked the NCBI nt database, and there are a lot of nt files, including:

  • (nt_euk) is associated with eukaryotic sequences
  • prokaryotic (nt_prok)
  • viral (nt_viruses); and other sequences (nt_others)
  • nt.000.tar.gz to nt.124.tar.gz (But I don't know: what kind of species for this database ?)

If I want to construct a blast database for non-green plant species and fungi to detect contamination, which nt files should I download (nt_euk and ??? ) ?

Thank you.

blast • 746 views
ADD COMMENT
4
Entering edit mode
10 months ago
GenoMax 147k

BLAST+ allows you to limit searches to specific taxID's. That would be the most efficient way assuming you have the infrastructure to store the database and run the searches.

In your case you can download the entire nt data base and then limit your searches using that feature. You can try this out on web blast. Fungi are taxID: 4751. You can use NCBI taxonomy browser to figure out the taxID's you want to use for non-green plants.

ADD COMMENT
3
Entering edit mode
10 months ago

Depending on how many sequences we are considering here, I'd say building a custom database for non-green plants/fungii might also be a good choice.

For that, you would need to retrieve the assemblies for your taxids; maybe the ncbi datasets approach might work for that, then construct the custom blast database from the resulting files.

ADD COMMENT
0
Entering edit mode

Depending on how serious OP is about removing contamination assembly approach may only go so far. While there are many genomes in NCBI I don't know what % of those falls in category of "practically complete" (i.e anything more requires so much work that it is not worth it).

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6