Question

Reduce coverage of available genome

0

Entering edit mode

10.2 years ago

MMK ▴ 20

Hi!!

Let's say I have whole genome sequence data and I want to reduce the coverage to imitate ancient DNA data. How do you recommend me to do it?

Cheers

Manuel

genome low coverage • 2.9k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by MMK ▴ 20

1

Entering edit mode

If you have an "ancient DNA data" as reference, I would determine the gaps in that data or the average coverage and then use that information to make your genome sequence data to give the same coverage or to have the same gaps...

ADD REPLY • link 10.2 years ago by alec_djinn ▴ 390

0

Entering edit mode

Are you trying to replicate the types of damage typical of ancient DNA that would appear in NGS? Or are you just looking to randomly reduce coverage?

ADD REPLY • link 10.2 years ago by heather.bouzek • 0

0

Entering edit mode

Whole genome ancient DNA data can be recovered but the coverage is very low. I want to use modern data and reduce coverage to imitate this condition. To imitate typical damages can also be interesting, hmm I'll think about it and let you know

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by MMK ▴ 20

0

Entering edit mode

You'd better take real ancient DNA sequence data and then downsample. It is nontrivial to simulate DNA damages and the shorter fragment lengths.

ADD REPLY • link 10.1 years ago by lh3 33k

score 1 · Answer 1 · 2015-04-02

1

Entering edit mode

10.1 years ago

Pierre Lindenbaum 166k

did you try DownsampleSam ? http://broadinstitute.github.io/picard/command-line-overview.html#DownsampleSam

"Randomly down-sample a SAM or BAM file to retain a random subset of the reads. Mate-pairs are either both kept or both discarded. Reads marked as not primary alignments are all discarded. Each read is given a probability P of being retained - results with the exact same input in the same order and with the same value for RANDOM_SEED will produce the same results."

ADD COMMENT • link 10.1 years ago by Pierre Lindenbaum 166k

2

Entering edit mode

I usually use samtools view -s 0.56 INPUT.bam > OUTPUT_56.SAM 0.56 means 56 percent of reads are subsampled and retained.