Question

Rna Seq Short Rna Alignment With Rsubread Package In R

0

Entering edit mode

10.7 years ago

Tom A ▴ 110

Hello.

I have a miRBase database of mature miRNA sequences that I want to allign against my adapter-stripped small RNA fastq files.

I am using the Rsubread package in R, but I am getting 0% mapping on every run I attempt.

I have no problems when building my reference DB using the 'build' function like so:

buildindex(basename=file.path("filepath","miRBaseSeqs"),reference="miRBaseSeqs.fa")

However, when I try to run the 'align' function:

align("miRBaseSeqs", "AdapterStripped.fastq", input_format="FASTQ", output_file="testAlgn.SAM", indels=1, minFragLength=15, maxFragLength=55)

It runs fine with no errors:

//========================== subread-align setting ===========================\\
||                                                                            ||
||           Function : Read alignment                                        ||
||            Threads : 1                                                     ||
||         Input file : AdapterStripped.fastq                                 ||
||        Output file : testAlgn.SAM (SAM)                                    ||
||         Index name : miRBaseSeqs                                           ||
||       Phred offset : 33                                                    ||
||                                                                            ||
||          Min votes : 3                                                     ||
||         Max indels : 1                                                     ||
||  # of Best mapping : 1                                                     ||
||     Unique mapping : yes                                                   ||
||   Hamming distance : yes                                                   ||
||     Quality scores : no                                                    ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//====================== Running (13-Mar-2014 11:33:58) ======================\\
||                                                                            ||
|| The input file contains base space reads.                                  ||
|| Load the 1-th index block...                                               ||
|| Map reads...                                                               ||
||    0% completed,   0 mins elapsed, total=12818k reads, rate=69.8k/s        ||
||    6% completed,   0 mins elapsed, total=12802k reads, rate=74.7k/s        ||
||   12% completed,   0 mins elapsed, total=12818k reads, rate=75.4k/s        || 
... etc

But the output is always this:

//================================= Summary ==================================\\
||                                                                            ||
||          Processed : 12826628 reads                                        ||
||             Mapped : 0 reads (0.0%)                                        ||
||             Indels : 0                                                     ||
||                                                                            ||
||       Running time : 5.9 minutes                                           ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

I have tried multiple argument parameters such as:

align("miRBaseSeqs", "AdapterStripped.fastq", input_format="FASTQ", output_file="testAlgn.SAM", indels=1, minFragLength=15, maxFragLength=55, DP_GapOpenPenalty=10, DP_GapExtPenalty=8, DP_MismatchPenalty=-5, DP_MatchScore=10)

subjunc("miRBaseSeqs", "PS24h.AdapterStripped.fastq", input_format="FASTQ", output_file="testAlgn.SAM", indels=1, minFragLength=15, maxFragLength=55, DNAseq=TRUE, reportAllJunctions=TRUE)

and variation on the theme, but I always get that 0% aligned output.

Does anyone have any suggestions? Anything is appreciated.

Thanks in advance!

r rna-seq alignment • 4.8k views

ADD COMMENT • link updated 10.7 years ago by lavertonvic • 0 • written 10.7 years ago by Tom A ▴ 110

0

Entering edit mode

Just to check, try mapping with bowtie1 and see do you still not get any reads mapped. You can use -n 2 -a -l 25 as options. Let me know

ADD REPLY • link 10.7 years ago by Varun Gupta ★ 1.3k

0

Entering edit mode

So I have used OSA to map a similar file; one that also contained miRNA seqs, but didn't have an option for all the mature seqs (3p, 5p arm) hence the need to use a different aligner (one that enables you to build your own reference library). Anyway, long story short, the adapter striped files are fine.

ADD REPLY • link 10.7 years ago by Tom A ▴ 110

0

Entering edit mode

The problem is that Subread does not report those reads that have bases which were mapped outside of reference sequences. You can get around this by padding your miRNA sequences with N's and then build an index for them and do the alignments. For example, you may add 100 N's to the left side of each of your miRNA sequences and also add 100 N's to the right side. This will prevent 100bp reads from overhanging your reference sequences, and the mapped reads should then be reported by Subread.

ADD REPLY • link 10.6 years ago by Wei Shi • 0

Ram · Answer 1 · 2014-03-16

Hi Tom, thank for using our Subread aligner. I noticed that you required subread not to report multi-mapping reads in the SAM file ("Unique mapping : yes"). If the reference genome is highly repetitive, you will get zero mapped read.

You can add a parameter, unique=FALSE, to the align function. This will make subread to report all mapping locations; if a read can be mapped to more than one location, a random one of these locations is reported. Or, if you want all these locations, just set the number of locations you want by using the nBestLocations parameter.

If it still does not work, please contact us on liao (at) wehi.edu.au. Thank you.