how to calculate genome coverage for integrated genomes?

0

Entering edit mode

6.9 years ago

marongiu.luigi ▴ 730

Hello,

I have some troubles calculating the number of reads that I should expect for a target sequence (for instance a trasposon) integrated into the human genome. That is: how many reads should I expect to map to my target sequence and confirm the presence of the target?

Assuming: 1) a pre-calculated coverage of 20, 2) a target region of 1000 bp and 3) a fixed length read of 150 bp and using the formula C=NL/G i get:

N=CG/L=20 x 1000 / 150 = 133 reads

this looks a bit too many reads. Or should I calculate using the whole human sequence, since the target is integrated into it? in that case, I get:

N=20 x 3 000 000 000 / 150 = 400 000 000 reads

that is clearly wrong.

My question is, therefore: how do I calculate the coverage in general and for integrated sequences in particular? Thank you

Assembly genome sequencing • 2.3k views

ADD COMMENT • link 6.9 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

But from where do you get this pre-calculated coverage of 20?

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Probably shooting for a 20x coverage.

ADD REPLY • link 6.9 years ago by GenoMax 148k

0

Entering edit mode

the data was given with this coverage but based on the human genome. I would like to estimate how many reads should I expect for the trasposon

ADD REPLY • link 6.9 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

Is the sequence for that transposon so specific that you don't expect to get any alignments outside that 1kb?

ADD REPLY • link 6.9 years ago by GenoMax 148k

0

Entering edit mode

well, the sequence is not human as such, but otherwise there is nothing special about the target; the reads should align more or less at the same average for both human and transposon. So shall I expect 20x coverage also for the transposon?

ADD REPLY • link 6.9 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

If you are sure the transposon is only in that one location (seems a bit implausible) then that may be a reasonable assumption. As long as there is no strange bias in transposon sequence compared to human genome.

ADD REPLY • link 6.9 years ago by GenoMax 148k

0

Entering edit mode

OK, otherwise is the formula correct? should I expect 133 reads covering the trasposon?

ADD REPLY • link 6.9 years ago by marongiu.luigi ▴ 730

Login before adding your answer.