I am currently using whole genome sequencing to identify structure variation (SV)
I wonder whether I need to perform normalization between the samples.
For example,
2-micron 10 1 N <dup> . . SVTYPE=DUP;STRANDS=-+:155;SVLEN=6307;END=6317;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;SU=155;PE=0;SR=155 GT:SU:PE:SR ./.:111:0:111 ./.:44:0:44
There are 155 SR split reads support this SV.
But it appeared in both treatment and control. If I call the SV separately for treatment and control, there is 111 SR for treatment, 44 for control.
Does Lumpy perform some normalization for calling the SV? such as total reads normalization?
Thanks,
Lumpy takes care of that internally. What you have to do is to filter out if a SV is germline or somatic. See here for example.