Hello.
I have a miRBase database of mature miRNA sequences that I want to allign against my adapter-stripped small RNA fastq files.
I am using the Rsubread package in R, but I am getting 0% mapping on every run I attempt.
I have no problems when building my reference DB using the 'build' function like so:
buildindex(basename=file.path("filepath","miRBaseSeqs"),reference="miRBaseSeqs.fa")
However, when I try to run the 'align' function:
align("miRBaseSeqs", "AdapterStripped.fastq", input_format="FASTQ", output_file="testAlgn.SAM", indels=1, minFragLength=15, maxFragLength=55)
It runs fine with no errors:
//========================== subread-align setting ===========================\\
|| ||
|| Function : Read alignment ||
|| Threads : 1 ||
|| Input file : AdapterStripped.fastq ||
|| Output file : testAlgn.SAM (SAM) ||
|| Index name : miRBaseSeqs ||
|| Phred offset : 33 ||
|| ||
|| Min votes : 3 ||
|| Max indels : 1 ||
|| # of Best mapping : 1 ||
|| Unique mapping : yes ||
|| Hamming distance : yes ||
|| Quality scores : no ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//====================== Running (13-Mar-2014 11:33:58) ======================\\
|| ||
|| The input file contains base space reads. ||
|| Load the 1-th index block... ||
|| Map reads... ||
|| 0% completed, 0 mins elapsed, total=12818k reads, rate=69.8k/s ||
|| 6% completed, 0 mins elapsed, total=12802k reads, rate=74.7k/s ||
|| 12% completed, 0 mins elapsed, total=12818k reads, rate=75.4k/s ||
... etc
But the output is always this:
//================================= Summary ==================================\\
|| ||
|| Processed : 12826628 reads ||
|| Mapped : 0 reads (0.0%) ||
|| Indels : 0 ||
|| ||
|| Running time : 5.9 minutes ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
I have tried multiple argument parameters such as:
align("miRBaseSeqs", "AdapterStripped.fastq", input_format="FASTQ", output_file="testAlgn.SAM", indels=1, minFragLength=15, maxFragLength=55, DP_GapOpenPenalty=10, DP_GapExtPenalty=8, DP_MismatchPenalty=-5, DP_MatchScore=10)
subjunc("miRBaseSeqs", "PS24h.AdapterStripped.fastq", input_format="FASTQ", output_file="testAlgn.SAM", indels=1, minFragLength=15, maxFragLength=55, DNAseq=TRUE, reportAllJunctions=TRUE)
and variation on the theme, but I always get that 0% aligned output.
Does anyone have any suggestions? Anything is appreciated.
Thanks in advance!
Just to check, try mapping with bowtie1 and see do you still not get any reads mapped. You can use -n 2 -a -l 25 as options. Let me know
So I have used OSA to map a similar file; one that also contained miRNA seqs, but didn't have an option for all the mature seqs (3p, 5p arm) hence the need to use a different aligner (one that enables you to build your own reference library). Anyway, long story short, the adapter striped files are fine.
The problem is that Subread does not report those reads that have bases which were mapped outside of reference sequences. You can get around this by padding your miRNA sequences with N's and then build an index for them and do the alignments. For example, you may add 100 N's to the left side of each of your miRNA sequences and also add 100 N's to the right side. This will prevent 100bp reads from overhanging your reference sequences, and the mapped reads should then be reported by Subread.