I want to screen some aligners to pick the most sensitive one.
What are the good aligners that work with Illumina hiseq paired end data? My genome is about 390 MB.I have 100 crore reads (paired end 502 crores).
Is it a good idea to pick up about 2100 reads and first do a BLAST search to identify their position, and then align those with each aligner?? I know that 200 reads are very few to assess the performance of an aligner. But, since I also have to manually check each of them by BLAST, I'm thinking of the minimum.
Also, looking for leads to shortlist some good aligners.
I have Illumina reads with me, from a plant, which is a hybrid of two distantly related varieties. References of the parents are available. I would need a sensitive aligner to tell me for sure that a read belongs to the maternal parent or the paternal one.
There are plenty of reports out that compare the commonly used NGS aligners. Spend some quality time on reading them; can all be found on PubMed. And please do not start to do any self-made BLAST-based comparisons in alignment accuracy. Read the literature first.
I agree with you! I've done a bit of literature searching, but I should do more of it. It really looks impossible to set up any sort of an experiment, which would actually make sense. Thanks for responding :)
With most of the aligners you can obtain what you need - if you set parameters properly. There are some questions you have to ask yourself:
Is speed important?
How related are the parental genomes?
Depending on those, you can set up your experiment. If you use, for example, bowtie2, or their new software HISAT2, you have many options to tweak to map reads with up to a particular alignment score and its very fast, it just requires some adjusting.
If you use BLAT, you can set a minimum accepted sequence identity to score a match, which could be very high if you don't want your reads to map on the other parental genome.
If you use GMAP, you can also choose among many different output formats that can ease out your further analyses.
You can also try BWA for example, it's just a matter of choice after all.
For your type of experiment, I would focus the benchmarking more on the parameter set than on the aligner itself. As long as you can choose thresholds for alignment scores / sequence identity / max-min mismatches and gaps / seed mismatches / max num of hits you're fine.
Which kind of sequencing data are you working? I am wondering why you want to do this
I have Illumina reads with me, from a plant, which is a hybrid of two distantly related varieties. References of the parents are available. I would need a sensitive aligner to tell me for sure that a read belongs to the maternal parent or the paternal one.
That is a tall order for any aligner. How good are you references?
You may want to look at BBsplit from BBMap to bin/split the reads (BBSplit syntax for generating builds for the reference genome and how to call different builds. )
100 crore = 1 Billion reads. Why is that equating to 502 crore PE reads?