Entering edit mode
7.5 years ago
plabanbiswas96
▴
10
Hello,
My project is mapping of DNA to reference genome (hg19). I would be working in java and would be running on Hadoop. I am stuck at selection of algorithms for mapping to reference genome. I came across various algorithms for mapping but can't figure out which would suit the purpose of mapping better. Can anyone suggest an algorithm for mapping which can be scaled for large Data in relatively short time (MIT or GPL licensed is fine).
I am new to this field. Please correct me if I am wrong and would really appreciate any suggestion or correction.
There are indeed many different aligners. One of the simplest and fastest one is bwa (burrows wheeler aligner). Bowtie2 also used a lot. There are many reviews in the litterature about those aligners. Give it a go, that's generally a good place to start.
NB: Most aligners do an excellent job on every type of DNA data. You just need to avoid to use aligners designed for RNA sequencing because they have a splice-aware function. If you want more specific answers, tell us the read length, type of DNA sequencing (Illumina?), number of samples and maybe the depth of sequencing.
I disagree with that statement. At least in the case of BBMap, it handles both DNA and RNA perfectly well, and "splice-awareness" does not hurt anything. Rather, it's capable of spanning long deletions (which in the case of RNA-seq mean introns, and for DNA can mean deleted genes), which allows more accurate mapping with no downside.
It's true that if you don't care about detecting whether some of your genes or exons have been completely deleted then it's fine to use an aligner that can't handle long deletions, but I don't really think that's a good idea in practice.
Incidentally, BBMap is written in Java.
Thanks, I will try burrows wheeler aligner and will ping back if there are any queries.