Hi,
I was trying to manually align RNA-seq reads to a RNA sequence in R using Biostrings.
Considering that my sequences are collapsed
A DNAStringSet instance of length 1400027
width seq names
[1] 22 GTCTGTGATGAATTGCTTTGAC 1-1694429
[2] 22 AGTCTGTGATGAATTGCTTTGA 2-1546669
[3] 22 GCATTGGTGGTTCAGTGGTAGA 3-928598
And I want to map them using my own algorithm that uses the matchPattern function to a pre-miRNA seq downloaded from mirBASE.
72-letter "RNAString" instance
seq: UGUCGGGUAGCUUAUCAGACUGAUGUUGACUGUUGAAUCUCAUGGCAACACCAGUCGAUGGGCUGUCUGACA
How should I actually convert my reads in order to be biologically compatible? Is it relevant to compare match(DNAString, DNAString(reverseComplement(RNAString_instance)) -considering how the reads are generated using Illumina?
Otherwise, how should I convert either the source or the destination string for a correct mapping?
I would recommend to also align the reads against the reference genome unless the protocol somehow selected specifically (pre)miRNA, because otherwise you might be seeing repeated sequences from normal transcripts and that might go unnoticed. Then check the regions of interest in the genome for high coverage, (possibly after removing duplicate hits).