Entering edit mode
7.2 years ago
MAPK
★
2.1k
Hi, I was wondering if there a way to align short reads with multiple long reads and see a long stretch of aligned region from the same genome? I have millions of short reads and I want to align those short reads to thousands of long sequences from the same genome and see the aligned region. Thanks
That is basically what every NGS aligner does. So is there a question here?
Hehe Just got confused. So basically can use BWA?
Yes, you can use BWA, but if the error rate of the long reads (are they long reads or contigs / scaffolds?) is high BWA will be very slow and possibly many reads will remain unaligned.
Or maybe you want to align short reads AND long reads to the same reference genome?
Also, there are tools to use illumina reads to do error correction of long reads.
If you're talking about reads around the 30-50bp mark, I'd use bowtie2 and switch on uniquely-mapped reads only (
--best -m 1
). If your reads are >70bp in length, useBWA mem
.I did use bowtie, but the problem arises when it extracts lots of sequences that are not exact match ( it allows for too many mismatches as well). My sequenses are srna reads and I want to align them to retrotransposons (LTR) regions. I have about a thousand LTR sequences and they are a few hundred bases long each. My short reads should match pretty well with LTR regions if there are any read from that region, but I am expecting very few matches from my experimental data. In any case, bowtie pulls out too many reads even from not-so-perfectly aligned regions
If you have the reference genome I would suggest to align the reads to the genome first and re-map to the LTR the reads that didn't map to the genome. That should reduce the noise you're dealing with.