I have patient tumor/normal HiSeq paired end data. I have 2 bam files which were cleaned/sorted/duplicates removed.One bam for tumor and other for normal. These bams are very large.
I am trying to use svdetect to find structural and cnv. I have run svdetect more than a day ago and the Software seems to slow down dramatically. Not sure how long does the processing of 1 bam file take.
One issue that I came across was mu_length,sigma_length.The sigma seems very high. I get the mu and sigma by running the script “BAM_preprocessingPairs.pl” provided by software. This script gives me A bam file output but before it converts sam to bam I can download the sam and view the mu and sigma.
///////////////////////////////////////////////////////////////////////////////////////////
//Tumor
///////////////////////////////////////////////////////////////////////////////////////////
<detection>
read1_length=104
read2_length=104
window_size=405
step_length=102
mates_file=tumor.ab.bam
cmap_file=....hs18.len
</detection>
<filtering>
strand_filtering=1
order_filtering=1
insert_size_filtering=1
nb_pairs_threshold=2
nb_pairs_order_threshold=2
indel_sigma_threshold=3
dup_sigma_threshold=2
singleton_sigma_threshold=4
final_score_threshold=0.8
***mu_length=141
sigma_length=132***
</filtering>
//////////////////////////////////////////////////////////////////////////////////////////
My read length for paired end is 104.
- Is the window size and step length correct?
- Is sigma suppose to be that large? (does it affect the analysis algorithm a lot) I tried using other tools (breakway: http://sourceforge.net/apps/mediawiki/breakway/index.php?title=The_Breakway_Compendium#How_ReadClusters_works)
And tried running their perl script to find the mean paired-end distance and standard dev of paired-end, and I get a mean of ~140 and sd of ~18.
SVdetect manual says : "To detect large SVs, a window_size value of 2σ from the mean has to be set ("µ+2σ” for a confidence interval of ~95%). To identify balanced translocations, a window size equal to at least “2µ+2√2σ” should be set." BTW : how many reads are there in your processed *ab.bam file.?
yup..i looked at the manual and changed my window size to 1000 with 500 step but still the program is too slow. After four days it crashes itself.
read 773192631 test reads read 815356938 ref reads
I am sorry but may I ask how did you obtain the insert size after running the BAM_preprocessingPairs.pl script. It only gave me some counts of mapped and unmapped reads and produced a BAM file at the end. It didn't output anything on mu or sd length. I would really appreciate your help.
Thank you