Hi,
I am trying to do the following exercise (still playing with indels):
- simulate short reads (and indels) out of a human genome chromosome,
- align the short reads simulated in (1),
- call indels using a SV detection tool,
- compare results in (3) with the indels generated in (1).
For this workflow, I'm using wgsim in (1), bowtie2 in (2) and dindel in (3). I've written at the end of this question the complete commands list.
So far I haven't been able to detect a single indel and I am wondering what I am doing wrong. Is it because I haven't simulated enough reads (1M reads)? Or something else? I am wondering if I misunderstood some concept or it is simply a matter of adjusting a parameter.
I tried to make the simulator to extend the indels (with -X parameter) but it didn't help.
Thanks in advance for any hint!
samtools faidx human_g1k_v37.fasta 20 > human_g1k_v37_chr20.fasta
wgsim -X 0.95 human_g1k_v37_chr20.fasta out.read1.fq out.read2.fq > wgsim.out
bowtie-build human_g1k_v37_chr20.fasta homo_chr20
bowtie -t homo_chr20 -X 700 -1 out.read1.fq -2 out.read2.fq -S homo_chr20.sam
samtools view -bS homo_chr20.sam > homo_chr20.bam
./dindel_x86-64 --ref human_g1k_v37_chr20.fasta --outputFile 1 --bamFile homo_chr20.bam --analysis getCIGARindels
=> no variants detected here (in 2.variants file) so I don't go ahead :-(
Pascal, as Wolf noted bowtie(1) cannot be used here. to verify this, check the CIGAR strings in your SAM output if you can find a single "I" or "D" there.