Question

Would Tophat place 3-5 mismatches in a row if I raised default mismatches allowed?

0

Entering edit mode

9.4 years ago

james.lloyd ▴ 100

So I have some very long RNA-seq reads (250nt) and I thought of upping the number of allowed mismatches. 2 is the default I used for 100nt but I thought of going to 5 for 250nt reads (1mismatch/50nt). I will be using Tophat to map these reads.

I am concerned Tophat would put >=3 mismatches in a row (nt next to each other) and I would like to stop that from happening so what I would like to know is if Tophat (and Bowtie) would map such a read and if so, are there any changes in its settings to stop that (other than keeping the mismatches set to 2 as default)?

If I cannot stop this, is there an easy to way to filter such reads out from the BAM/SAM file?

Many thanks,
James

RNA-Seq Tophat • 2.2k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by james.lloyd ▴ 100

Ram · Answer 1 · 2015-07-16

0

Entering edit mode

9.4 years ago

Istvan Albert 102k

The proper way to think about this is to understand that alignments are chosen to maximize a numerical score that is built from the positive values for match and negative values (penalties) for mismatch, gap open and gap extension. The number of matches or mismatches that you allow are not relevant here. Those may be used filter the alignments but do not factor into generating the alignments.

If you don't want two or more mismatches in a row than all you need to do is to ensure that you instruct the aligner that two mismatches should score worse than gap open + extension. Which is probably the default setting anyway.

ADD COMMENT • link 9.4 years ago by Istvan Albert 102k

0

Entering edit mode

Thank you for your reply. I am trying to compare what you said with the options in the Tophat manual. I have selected these options for my run allowing for 5 mismatches. Do you know if these setting will prohibit >2 mismatches in a row? I think the read-gap-length will do this but I am not sure if read-edit-dist interferes with that.

--read-mismatches 5 (default 2)
--read-gap-length (left as default 2)
--read-edit-dist 5 (default 2; I had to change it to 5 when I increased read-mismatches)

Thanks again,
James

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by james.lloyd ▴ 100

0

Entering edit mode

I think you are overly hung up of the being afraid of two mismatches in a row. Imagine that your data actually comes from a sample that actually has two mismatches in a row - why would you not want that to be reported correctly? It would scientifically be inappropriate to forbid this to happen a priori. In general it is rare to get multiple mismatches in a row by accident since the mismatch penalties are typically higher than gap open + extension so some other alternative alignment will be found. But I would recommend to move on and stop being concerned about something that rarely happens and when it does happen is probably correct anyhow.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by Istvan Albert 102k

0

Entering edit mode

I am fine with 2 mismatches in a row. It is more than 2 mismatches in a row that troubled my lab so I am trying to see if my mapping approach as described above would stop 3 or more mismatches in a row. I also do not think it would be bad to have some rare cases where there are multiple mismatches in a row but wanted to understand what I had done better and to see if this fear my lab had was even real which I am still having trouble seeing if it is.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by james.lloyd ▴ 100

0

Entering edit mode

2 or 3 or 4 or 5 makes no difference - and like I said the way is not to filter out or forbid it from happening - if your aligner reports three mismatches in a row than it means that is the most likely alignment based on what parameters you have set. And that's that, the way around it is not to filter out just this one thing but allow all others. It would be pretty absurd (and bad science) to remove three mismatches in a row but allow three mismatches as long as there is one base separation between each mismatch. This latter is a far more suspicious alignment IMO.

ADD REPLY • link 9.4 years ago by Istvan Albert 102k

0

Entering edit mode

It is a very good point and I would not want to bias my analysis in an unfair way. Thanks for the advice.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by james.lloyd ▴ 100