Hi!!
Let's say I have whole genome sequence data and I want to reduce the coverage to imitate ancient DNA data. How do you recommend me to do it?
Cheers
Manuel
Hi!!
Let's say I have whole genome sequence data and I want to reduce the coverage to imitate ancient DNA data. How do you recommend me to do it?
Cheers
Manuel
did you try DownsampleSam ? http://broadinstitute.github.io/picard/command-line-overview.html#DownsampleSam
"Randomly down-sample a SAM or BAM file to retain a random subset of the reads. Mate-pairs are either both kept or both discarded. Reads marked as not primary alignments are all discarded. Each read is given a probability P of being retained - results with the exact same input in the same order and with the same value for RANDOM_SEED will produce the same results."
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If you have an "ancient DNA data" as reference, I would determine the gaps in that data or the average coverage and then use that information to make your genome sequence data to give the same coverage or to have the same gaps...
Are you trying to replicate the types of damage typical of ancient DNA that would appear in NGS? Or are you just looking to randomly reduce coverage?
Whole genome ancient DNA data can be recovered but the coverage is very low. I want to use modern data and reduce coverage to imitate this condition. To imitate typical damages can also be interesting, hmm I'll think about it and let you know
You'd better take real ancient DNA sequence data and then downsample. It is nontrivial to simulate DNA damages and the shorter fragment lengths.