Entering edit mode
7.9 years ago
wilfried.guiblet
•
0
Hi,
I am looking at the error rates of a particular Illumina sequencing project. I have the bam files from the project and want to see the mismatches and indels in the reads compared to the reference genome. The data is pair-ended and PCR-free. I thought I needed to call all the variants in the alignment by
(1) filtering out poor quality alignment
(2) not filtering out poor quality variant calls
with a regular variant caller. I like freebayes but I guess any caller would do. I considered using the following options:
freebayes -f reference.fasta -F 0.01 --min-alternate-count 1 --min-alternate-fraction 0.01 alignment.bam
Are my parameters correct? Does my idea even make sense?
Thank you!
How will you disentangle a bona fide/genuine sequencing error from a mutation? Do you have some PhiX spiked-in?
Short answer: I won't but it doesn't matter too much.
Long answer: I am comparing two technologies with the same biological sample. I can assume the shared calls to be mutations + shared errors from the machines and won't be able to differentiate them. It is not great of course, but I am mostly interested in the calls that differs, which should not contain the mutations but only sequencing errors.