When Aligning Mirna (Micro Rna) What Reference Is Better? (Whole Genome Or Mirbase)
2
7
Entering edit mode
14.1 years ago
Doctoroots ▴ 800

i have an illumina run data with ~30M reads and im trying to infer miRNA expression. should i align the reads to whole genome reference and then capture the ones that aligned to regions known to transcribe miRNA?

or is it better to use the miRBase data set?

also, when aligning miRNA, do you have any recommendations regarding important parameters to include? (such as making sure the adapter is trimmed).

thanks.

mirna alignment reference • 15k views
ADD COMMENT
20
Entering edit mode
14.1 years ago
Antonio Marco ▴ 190

I would recommend to map to the genome first. The reason is two-fold: first, many reads that can potentially map to microRNAs also map to other regions of the genome, and therefore should be discarded; second, this permits the detection of new microRNAs not described in miRBase (which can be very useful if you species has not a comprehensive catalog).

If you're working with a well studied species, such as Drosophila or human, the quickest way is to map to miRBase directly.

For mapping in base-space, Bowtie works great. You should remove first the linkers from your reads. However, since linkers can also have mismatches, I recommend you to trim one-by-one nucleotide at the 3' end sequentially before mapping. This is a personal choice, though, but it worked quite well for us. You may find more information about this procedure in: http://www.ncbi.nlm.nih.gov/pubmed/20817720

Hope this helps!
Antonio Marco

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

you just trimmed every read to 26nt?

ADD REPLY
1
Entering edit mode
7.8 years ago
enxxx23 ▴ 280

I would recommend to align the reads on all miRNAs because the reads are coming from miRNAs. miRNA is even in the title of miRNA-seq. The equivalent for RNA-seq is to align the reads on all RNAs (that is the transcriptome) as tools as Kallisto and Salmon are doing.

It it not a good idea to align miRNAs on genome because the sequences of the miRNAs are very short and most of the aligners have not been designed for aligning so short sequences on such a large reference (that is the human genome). That means in practice that after mapping a miRNAs in 2 or 3 places on genomes many aligners would stop to look for the 4th and 5 and 20th and 50th place on genome and it might happen that only one of the those genome places are annotated as miRNA.

ADD COMMENT
0
Entering edit mode

Some comments :

  1. Kallisto and Salmon do not align RNA-Seq reads directly on RNAs sequences. It's a pseudo-alignment using k-mers.
  2. most of the current NGS aligners are designed to handle small sequences and to align them on a reference genome. For the multi-mapping issue, most of them can ben tune to allow it e.g. bowtie with -k parameter.
ADD REPLY
0
Entering edit mode

So according to the articles/pre-prints where Salmon and Kallisto have been published, Salmon and Kallisto are mapping/aligning reads on transcriptome. Indeed there are many types of aligments/mappings, like for example pseudo-alignment or pseudo-mapping but this is a sub-type of alignment/mapping and it is not a different category of mapping/alignment because one still gets that info that this reads is mapping/aligning on this reference sequence!

Actually, most of aligners out there (Bowtie, Bowtie2, STAR, BWA, etc.) are not performing optimally when aligning short reads which are 16-25 bp long sequences on genome. This is very simple to check by reading the papers where these aligners have been published. The authors of these aligners have not tested or designed these aligners specifically for miRNAs. According to the authors of these papers, all of these aligners have been optimized and tuned for reads that are longer than 36 bp (or even longer than 50bp). Some aligners do not even report all possible mapping locations for a given read (for example if a read is mapping on 50 places on genome and only one of this places is annotated as miRNA). Therefore in order to "help" the majority of aligners to perform well one needs to reduce the search space (by using the transcriptome instead of genome) and also playing with the parameters. Just playing only and only with "-k" parameter of bowtie does not solve or help at all with miRNAs.

I would even recommend that one would do the other way around and that is take all known miRNAs (from the miRbase and there are around 5K known miRNAs) and map them on your reads (that is using the FASTQ files as your reference). Of course here also it is needed an aligner which is able to report absolutely all mapping positions for a query sequence. This has the advantage that is very fast, sensitive, specific and also it needs no trimming of reads.

So when one does miRNA-seq, then one should use as reference what is in the title. Like for example when doing RNA-seq align/map the reads on all known RNAs, when doing miRNA-seq align the reads on all known miRNAs, when doing DNA-seq align the reads on the DNA, etc.

ADD REPLY

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6