Dear Friends, Hi (not native in Eng...)
I have used DIAMOND for creating a .daa file after blastx my transcriptome.fasta assembly against NCBI nr database with this script:
> diamond blastx -d nr -q '/home/Trinity_pathless.fasta' -o diamond-Trinity-daa -p 22 -f 100 --evalue 0.000001 --sensitive
Then I have imported it in the MEGAN6 community version (I have tried both approach (1) direct import and then create a MEGAN6 "RMA" file and (2) using Meganaizer tool - according to last lines of the MEGAN manual),
but the result has no taxonomic data!
Please help me in this regard and thank you in advance
NOTE: It seems that the MEGAN6 manual did not offer any guidance about blast taxonomy parameters.
NOTE2: If you are aware of any MEGAN6 problem-solver groups, please let me know.
MEGAN has a dedicated user forum: http://megan.informatik.uni-tuebingen.de
Diamond doesn't give taxonomic IDs in the blast hits so you have to add them or use MEGAN to map them (GI/Accession to TaxID - you need the big NCBI files). If you've done that (is that your RMA file?) then make sure minSupport set to 1 and minSupportPercent set to 0 (off) - they control the minimum number of sequences that a taxon must have assigned to it for it to be displayed.
Dear Tonor , Hi.
Thank you for informing me about MEGAN forum.
1- After I have import my diamond.daa file in megan6, it created the RMA file, itself. does it solve any problem ?
2- what do you mean by "big NCBI files" ? if you kindly provide the links, I will download them
3- after downloading the NCBI files, how I can map them using MEGAN ?
The diamond blast output will typically just have the GI or Accession number in the blast hit (NCBI recently abolished GIs).
NCBI provides gi2_taxid_prot and prot_accession2taxid files from the FTP site: ftp://ftp.ncbi.nih.gov/pub/taxonomy/
You can use these In MEGAN to map your blast hits to the taxonomy, when you load in your data there should be options to specify the location of these taxonomy mapping files.
Thank you for your help.
Have you even tried "Diamond_BLAST_add_taxonomic_info" and (if yes) is it appropriate for my situation or the NCBI pipeline that you have suggested is better ?
Haven't tried it, but looks suitable, although a little old (uses GIs instead of accessions) - might be best to try the MEGAN forum as to best tools for it
Dear Tonor, Hi.
In the link you have provided there is :
1- gi_taxid_prot.zip (instead of gi2_taxid_prot )
2- there is a "accession2taxid" directory and in it the "prot.accession2taxid" file
did I must download and use these two files ?
Hi - it depends on what BLAST db you are using - and older one with give you GI numbers in your BLAST hits (so need gi2_taxid_prot), whereas a newer one will give you accession numbers (so need prot.accession2taxid) - does that make sense?
Thank you, Yes.
I have a local database of NCBI nr which is contains multiple files and is downloaded from here (ncbi ftp) and I have downloaded nr.58.tar.gz recently.
Do I need only the " prot.accession2taxid" in this case ?
In MEGAN - you would go File -> Import from BLAST, select your diamond file in File, and then in Taxonomy tab, either click Use Accession or Use GI and then select the corresponding file. Although I've found this takes ages on my machine (not very high spec).
So, this fact that the .daa file of DIAMOND output is not behave very well for taxonomic purposes of MEGAN6, is a little disappointing.
Dear Tonor, I have imported the .daa blast result in MEGAN6 and then the huge "prot.accession2taxid" but it shows only two nodes again !
Check your minSupport and minSupportPercentage are set to 1 and 0 respectively. I think this is in Options -> LCA parameters
Also double check that your diamond Blast result use accessions rather than GIs
Dear Tonor, Hi and thank I will try minSupport option you have mentioned.
About Accessions and GIs, the Diamond .daa file is a binary file but the normal tabular blastX I have done on the same data using Diamond showed the result as below :
TRINITY_DN212758_c0_g1_i1...XP_002531646.1...81.3 107...20 0 3323 199 305 2.9e-41 176.4
TRINITY_DN212728_c0_g1_i1...XP_014502021.1...89.2 37...4 0 3113 403 439 8.6e-10 71.2
TRINITY_DN212793_c0_g1_i1...XP_015200040.1...91.8 61...5 0 665 483 238 298 9.7e-23 115.9
In this situation, Do I must use "prot.accession2taxid" or other files ?
yes the prot.accession2taxid is the one - if it still doesn't work - reckon you should go the MEGAN community site - the developer is pretty active there - there are probably a few extra ways to go
May I ask, which version of diamond are you using? I have never used MEGAN but from an older version of diamond you get in FAQ of the manual:
Q: Reads imported into MEGAN lack taxonomic or functional assignment. A: MEGAN requires mapping files which need to be downloaded separately at the MEGAN website and configured to be used.