Mapping sequencing reads to circular reference
1
1
Entering edit mode
2.1 years ago
Colaptes ▴ 100

Hello,

I am looking to map sequencing reads to a circular reference genome (animal mitochondrial genome in my case). I have been looking for an up-to-date tool that can map reads to a circular genome but have not been able to find one - does anyone know of a tool that can do this?

All answers I have found so far just recommend converting the circular reference into a linear sequence and simply mapping the reads as though it were a linear genome, with or without padding the ends of the reference to accommodate mapping reads on the very end. However, my purpose is to analyze the relative sequencing depth across the mitogenome, and this would penalize sequencing depth at the ends of the sequence. I suppose I could map twice, with two different breakpoints, but this seems like a severe waste of computational resources (I have hundreds of Gb of sequencing data to map). Does anyone know of a better way to accommodate circular genomes?

Thank you!

circular mitogenome genome mapping • 2.5k views
ADD COMMENT
1
Entering edit mode
2.1 years ago
shelkmike ★ 1.4k

I know three ways to deal with this problem:
1) As you said, make two variants of the genome. For example, you can move the first 1 kbp onto the end and align reads again.
2) If your read length is, say, 100 bp, just don't consider the first and the last 100 bp of the genome.
3) Use CLC Assembly Cell (https://digitalinsights.qiagen.com/products/qiagen-clc-assembly-cell-direct-download/). It is a proprietary bioinformatic toolkit capable of many things, including read alignment. As far as I know, it's free for the first 15 days. Its read aligner works, in my experience, very well, and it is the only read aligner I'm aware of that can align reads taking into account circularity of references. It has a special option "--circular" to indicate that a reference is circular. After alignment, you can calculate coverage per position with another program from this toolkit, clc_mapping_info.

ADD COMMENT
0
Entering edit mode

Thank you! That is too bad that the only circular mapper is proprietary.

ADD REPLY
0
Entering edit mode

Hello shelkmike, Can you explain #2 in more detail please? I'm also trying to learn how to map reads to a mitochondrial genome. Thank you so much!

ADD REPLY
0
Entering edit mode

I mean that if you want to find places with anomalously high or low coverage, you may just not consider borders of the contig.
For example, if you have a 100 bp Illumina read, 90 bp of which belong to one end of a circular contig and 10 bp to the other, a read aligner will not align those 10 bp (read aligners usually don't align too short sequences), which will lead to underestimation of coverage. So, when searching for regions with anomalous coverage, just don't consider some short regions in the start and in the end of the contig.
However, the method "1)" is more accurate than the method "2)", though somewhat more time-consuming.

ADD REPLY
0
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6