Friends,
I am trying to run Oases for transcriptome assembly. The result is far from expected, so I would like to ask whether I am running it in a right way? Thanks.
Here is my running command:
python scripts/oases_pipeline.py -m 25 -M 29 -o output -d " -strand_specific -shortPaired data/reads.fa" -p " -min_trans_lgth 100 -ins_length 300"
My library is strand-specific and pair-ended with length 67bp. The reads are shuffled as:
>0(left_mate_forwarded)
ACTC...
>1(right_mate_reverse_complemented)
TATA...
I got some transcripts, but are far from the transcripts annotated, also far from the result of Trinity. The longest contig from Oases is ~2500bp (vs. ~10000bp from cufflinks and ~6000bp from Trinity). The N50 value is also low. It only reports 20 contigs those cover full-length of some transcripts from Cufflinks (totally ~4000), while Trinity reports ~650.
The dataset I am using is a subset of S. pombe. Does it matter?
Could somebody help me point out whether something wrong here? Thanks.
You didn't specify "-fasta" so if it was expecting fastq you'll only be using 50% of your reads. Did you revcom the right mate? If it's from Illumina, just leave it as-is. You are only trying 3 k-values, which look pretty low. If your reads are 100bp PE and you have enough of them, I'd try higher k-values.