Hello community,
I am using bowtie2 to align sequences to a reference genome. The results are quite disappointing: 48% of the reads align exactly 1 time and 44% of the reads aligned more than once.
I have single-end reads 55-70bp long. The reference genome is the OreoNil2 (Oreochromis niloticus).
I am not sure about this, but I guess each sequence that aligns multiple times has different score according to how good is the alignment on the reference genome. I would like to extract in a new sam file the reads that align only once (48%) and the reads with the best score among the reads that align multiple times.
Does anybody knows if this is possible and how to do something like that? Do I introduce any bias if I pick those reads?
Thanks in advance!
Might be worth trying bwa and comparing results. If these are paired end reads I would expect a smaller proportion of multiple mappings.
Once you pick a subset of reads with higher scores, yes, you will introduce bias. What is your ultimate goal?
My goal is to get as much alignments as I can but as it seems, I have to use less than 50% of my total reads. I have hydroxymethylation data and I need coverage, as much as I can get. I will try different aligners just to see if I get better results. However, I think that bowtie2 is a quite good aligner. So, I do not hope for a miracle. Thank you for your input!
You'll typically get a much higher alignment rate with BBMap compared to Bowtie2, when using data with low identity to the reference. Particularly, you can add the flag "slow" or "vslow", and use a shorter kmer length such as 11, to increase the alignment rate even more.