I'm currently working on a deletion caller that finds deletions from sequencing data. To evaluate this caller, I generated the simulated dataset by introducing a certain number of known deletions into the reference genome. It is not rare that some bases are inserted when a deletion occurs. To simulate such deletions, I want to know how the inserted bases are generated. Are they associated with the reference?
Thank you in advance!
Zhen
Is N-nucleotide randomly added by TdT?
Well generally yes. Its pattern is of course not 100% uniform, have a look at Fig. 3A here http://www.pnas.org/content/109/40/16161.long.