Dear all,
I got a question about using MEGAN4 to parsing SAM file.
What I want to do is to get taxonomic and functional annotaion of my raw reads against nr database . As the raw reads is too big (11 million reads in total, 100bp long each) for direct blast against nr database. So I took an approach first assemble my reads into ORFs which I could got blast result easily and then aligned my reads to ORF. Then I want to use MEGAN to parse the alignment of reads to ORF thus get the annotation of raw reads.
Here is what I did exactly:
-I first assembly the reads into contigs
- then use MetaGeneMark to find open reading frames (ORFs) whose size is suitable to blast against nr database.
- blast ORFs against nr database
- import the ORF blast result into MEGAN using default parameters and successfully get the rma file
- use the Export-Assignments To CSV funtion of MEGAN4 to generate a synomous file which contains two colums (tab seperated): the first one is the name of ORF and second column is the taxonomy ID
- use bowtie align my raw reads to ORFs and get the SAM file that I want to parse
My problem: Its said on the user manual that import SAM file using the synomous file MEGAN should parse the SAM file, but what I got is all my reads are asigned into two big groups one is called "No hits" and another is "Low complexity". like this:
I have tried it several times, it just works that way. Does anyone know how to fix this? or is there any other alternative method to parse the sam?
Just make sure that you have the ORF to taxonomy mapping (synonyms) used during data SAM file import.
From your description it is not clear that you have actually specified the synonyms as parameter during the import phase.
Thanks for you comment. yes I have use the synonyms in the parsing during my try. And Ive wrote to the developer of MEGAN; they told me there is a bug causing this kind of problem and they have updated new version of MEGAN which can parse SAM now.