Hi all,
I have a set os samples from ancient DNA. Highly degraded and with a lot of bacterial contamination.
We did shallow sequencing just to see if there is any human DNA in it.
Now we want to select the best samples to go with deep sequencing and probably whole genome enrichment.
The question is, by looking at my alignment results:
how many sequences were aligned (probable human) and to how many unique places in genome, can I come up with a rough estimate of how deep should I sequence my sample to get at least one read at each human molecule present in library?
And the reason for doing this: If I have only this much template human molecules there is no point going really deep with sequencing, since I will only add coverage to the sites, that I already have sequenced.
So how deep to sequence to find all there is at least once?
And this is what I was thinking of till now:
I have my alignments, I can estimate how many unique places of alignment I have. I can do random subsampling a couple of times at the same depth (the same number of randomly selected reads) map those and see how many different unique mapping places I get between each mapping.
Does it make any sense to try and look for an answer this way? Any suggestions and different approaches are warmly welcome!
Kind Regards,
me
Hi, some doubts about your problem: Is paired- or single-end sequencing? Which tool did you use to align reads to you reference human genome? Did you filtered out not-uniquely mapped reads?
single end,
bwa -n 002, -l 1000
(seed blocked for aDNA). No filtering beside adapter trimming and >25 bp length for fasta before alignment. I seem to be getting close to the answer, but my model fitting skills in R aren't top notch :x