Question

Calculating sequencing errors

0

Entering edit mode

6.3 years ago

khanirfanbioinformatics • 0

There is a circular genome of length L. You have a sequencing machine which reads a consecutive subsequence of length exactly n of this genome starting from a position chosen uniformly at random. However, there is a problem: the machine makes errors -- each nucleotide is replaced with an incorrect one with probability p.

You have a budget for k reads. After the sequencing is done you count nucleotides read from each genome position. After that for each genome position you decide which nucleotide is located at this position in the genome by the following rules:

The desired nucleotide is the one with the largest number of occurrences at this position; If there are several suitable nucleotides you choose one of them uniformly at randomly. In particular, if a position has never been covered by a read, then every nucleotide occurred 0 times at this position and a nucleotide is chosen uniformly at random.

You are given several tests. Each test is represented by the length of the genome L, the length of a read n, the probability of an error p and the total number of reads k. Find the expected number of positions with incorrect nucleotides.

The first line of the input contains the number of test cases t. Each test case is specified by four numbers L, n, p and k.

SAMPLE INPUT
4 2 0.000 1
4 2 0.000 2
10 3 0.050 4


SAMPLE OUTPUT
1.50000000000
0.75000000000
2.14491825000

In the first test case, the length of the genome is 4. The sequencer gives us only one read and it always makes no mistakes, so it provides us with true information about 2 nucleotides. For each other nucleotide the probability of the correct choice is 0.25, so the answer is 2⋅(1−0.25)=1.5.

In the second test case, we can use the same sequences twice, so we can know true information about 2, 3 or 4 nucleotides.

Knowing information only about 2 nucleotides happens with the probability 0.25 and the expected number of mistakes is 1.5. Knowing information about exactly 3 nucleotides happens with the probability 0.5 and the expected number of mistakes is 0.75. We know information about all nucleotides with the probability 0.25 and the expected number of mistakes is 0. So the answer for the second test case is 0.25⋅1.5+0.5⋅0.75+0.25⋅0= 0.75.

CAN YOU EXPLAIN HOW WAS THIRD SAMPLE OUTPUT ACHIEVED FROM THIRD SAMPLE INPUT?

sequencing • 2.1k views

ADD COMMENT • link 6.3 years ago by khanirfanbioinformatics • 0

0

Entering edit mode

You know your TA is probably in this forum, right?

ADD REPLY • link 6.3 years ago by Asaf 10k

0

Entering edit mode

I graduated 2 years ago. Besides, I have asked for an explaination for an example, I have not uploaded my test files.

ADD REPLY • link 6.3 years ago by khanirfanbioinformatics • 0

0

Entering edit mode

Sorry, it just sounded like homework

ADD REPLY • link 6.3 years ago by Asaf 10k

0

Entering edit mode

It's question 2 from the [Stepik] Bioinformatics Contest 2019 Qualification Round which started 7 days ago .. The deadline passed 4 hours ago today.