For me it is not clear how MEGAN assign the SEED classes to the sequences. In the manual, it says that it is made throughtout the identification of RefSeq id (accession number?).
When I load my blast file using the builtin refseq map, none of the sequences gets an assignment... The taxonomic identification works...
I did use your mapping file. However, I still end up with not assigned SEED hits. Which files do you import? What are your LCA parameters?
The weird thing is that I got it to work yesterday but I can't reconstruct how I got it to work.
ADD REPLY
• link
updated 3.2 years ago by
Ram
44k
•
written 10.4 years ago by
BSP
•
0
0
Entering edit mode
I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files:
This might sound stupid but could you also give me the other LCA parameters like Max Expected Top Percent ..
And what do you select on import -> I selected GI to Taxon mapping and Refseq to Seed mapping using your uploaded file.
ADD REPLY
• link
updated 3.0 years ago by
Ram
44k
•
written 10.4 years ago by
BSP
•
0
0
Entering edit mode
Ok, now I am a bit more confused. In your first post you said your subject_id is in this format:
gi|503216142|ref|WP_013450803.1|
But now your format only has the RefseqID, eg:
ref|ZP_10741618.1
Was this the problem? My Output has the first format you posted (GI-ID|Ref-ID)
ADD REPLY
• link
updated 3.0 years ago by
Ram
44k
•
written 10.4 years ago by
BSP
•
0
0
Entering edit mode
Both work. The first one I utilized the taxon mapping using the gi numbers. In the second one it is not necessary, since there is an additional column with the taxon name.
LCA parameters: min score 50 max expected 0.01 top percent 10 min support percent 0.1 min support 1 lca percent 100 other options turned off
I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files:
You didn't give us much to work on here (what are your errors, etc). It looks like you have had this problem for a while based on the past questions (here and here that you have had.
Basically, your BLAST output is not "talking" to the parser that does the SEED classification. This can be due to an incorrectly formatted BLAST output, not querying a RefSeq ID, etc.
I would start at the beginning and proceed until you have an error and then dissect which step is failing you -- sounds like it is somewhere after the BLAST analysis. Have you updated the RefSeq and SEED databases or checked that they downloaded correctly in MEGAN? You didn't provide any information in your samples -- would these be easily annotated?
Firstly, thank you for the feedback. Sorry about the questions, but I am a newbie in this field, maybe I am doing a very basic mistake...
Yes, I have read the MEGAN manual. But unfortunately it is not so clear to me. According to it and the website, MEGAN is provided with built-in SEED and KEGG mapping files. However I cannot find them in any of the directories of the program, and none of them are available in the MEGAN website.
I also already have read the recommended posts, and as you can see, people are having the same problems as I am, and no solution is available.
Since my metagenome data have to be converted, I am testing the program with a testing sample (a tabular blast file with six sequences). The manual mentions that there are example files available for testing, but it is not true.
As I mentioned previously, the taxonomic classification works, meaning that MEGAN is identifying the gi number. But in respect to the SEED classification, no error occurs, the sequences are just "classified" as "not assigned". This is also true for KEGG classification.
Without going to the MEGAN manual to check, it sounds like you do not have a reference database (or connection to one) in your installation. You need to figure out why this is the case so you can parse your BLAST table to KEGG, SEED, etc. You may have to actually download the SEED files and place them in the correct MEGAN folder, but perhaps you can download these through MEGAN.
I haven't used MEGAN in a few years -- with Illumina data, BLAST based metagenomic classification methods are ridiculously slow. As a result, I don't really know too many people who have used the program lately. Eventhough I don't use it much anymore, I will say it is, along with Dan Huson's other programs, well put together.
Speaking with Daniel Huson, I understood the problem. The builtin SEED mapping file was not up to date, therefore there was no correspondence between it and my file.
Since then MEGAN was updated several times. I believe it would be better practice to download its last version. The “built-in” files are contained in data.jar
Hello fhsantanna,
how did you solve the problem?
I did use your mapping file. However, I still end up with not assigned SEED hits. Which files do you import? What are your LCA parameters?
The weird thing is that I got it to work yesterday but I can't reconstruct how I got it to work.
I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files:
My file is in the exact same format.
This might sound stupid but could you also give me the other LCA parameters like Max Expected Top Percent ..
And what do you select on import -> I selected GI to Taxon mapping and Refseq to Seed mapping using your uploaded file.
Ok, now I am a bit more confused. In your first post you said your
subject_id
is in this format:But now your format only has the RefseqID, eg:
Was this the problem? My Output has the first format you posted (GI-ID|Ref-ID)
Both work. The first one I utilized the taxon mapping using the gi numbers. In the second one it is not necessary, since there is an additional column with the taxon name.
LCA parameters: min score 50 max expected 0.01 top percent 10 min support percent 0.1 min support 1 lca percent 100 other options turned off
Today I have downloaded the last version of MEGAN. For Seed analysis I just have checked "use builtin refseq map".
I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files: