How is a deletion with inserted bases simulated?
1
0
Entering edit mode
10.7 years ago
zhangz.cs ▴ 300

I'm currently working on a deletion caller that finds deletions from sequencing data. To evaluate this caller, I generated the simulated dataset by introducing a certain number of known deletions into the reference genome. It is not rare that some bases are inserted when a deletion occurs. To simulate such deletions, I want to know how the inserted bases are generated. Are they associated with the reference?

Thank you in advance!

Zhen

sequence genome • 2.1k views
ADD COMMENT
1
Entering edit mode
10.7 years ago

It depends on the molecular mechanism that resulted in such deletion. There could be a simple random nucleotide addition, insertion of palindromic sequence (see this example for physiological TCR recombination), trinucleotide repeats, etc. Here is another nice paper on the paper on the diverse composition of translocation junctions including inserted/deleted nucleotides.

But I would suggest you first test your algorithm on fully random data and the select the cases when it fails and study them. If there are no specific purposes of optimization or some suggestions how utilize some characteristic features of mutations to improve precision/recall for a specific mechanism, I would suggest starting with random (uniform) data.

Moreover, I believe that in your task it is more important to distinguish your indels from indel errors, e.g. homopolymer indel errors of Illumina (See here) / indel error plethora generated by 454 (See here). So I would recommend starting with analyzing patterns of those.

ADD COMMENT
0
Entering edit mode

Is N-nucleotide randomly added by TdT?

ADD REPLY
0
Entering edit mode

Well generally yes. Its pattern is of course not 100% uniform, have a look at Fig. 3A here http://www.pnas.org/content/109/40/16161.long.

ADD REPLY

Login before adding your answer.

Traffic: 862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6