Hello,
I have population sequencing (125bp SE) of an organism, and I have managed to find large indels spanning several kb with bbmap tools. See here how the data looks when the .sam file is visualized.
http://imagebin.ca/v/2KVNpMvDWJCw
It looks like some might even be supported by 2 reads or more (same start same stop).
Is there anyway to call these? What software would you recommend? I tried freebayes, but I can't get any calls for some reason.
Adrian
EDIT:
I tried the following commands with samtools:
samtools mpileup -d 250 -m 1 -E --BCF --output-tags DP,DV,DP4,SP -f RefAnnotated_SPades_HXE-3.fasta -o mapped_AdaptTrim_trial1.bcf mapped_AdaptTrim.sort.bam
bcftools index mapped_AdaptTrim_trial1.bcf
bcftools call --skip-variants snps --multiallelic-caller --variants-only -O v mapped_AdaptTrim_trial1.bcf -o mapped_AdaptTrim_trial1_vcf.vcf
However, the detected indels are not too long, maximum 50-100bp. The ones I am seeing in the image posted are a few kb in length.
Do those long thin lines represent deletions? If so, they mostly look like false mappings. Having just two supporting reads amongst those many false hits is too weak to be useful.
Yes they are deletions. In one case, there are 14 reads supporting one 11kb deletion (different sample). I got freebayes to call long deletions, however, the problem that I am running into is that if freebayes called a deletion from position X to position Y, it will not call a smaller deletion that is located between positions X and Y, even though that one might have more reads supporting it. I just need something to call them, then I can try PCR.
You'd better try lumpy/delly first, using their preferred mappers. They look at split reads as well as read pairs. They are peer-reviewed and independently evaluated. I know too little about bbmap split read to give any useful advice.
Will do, however, at least lumpy takes as input bwa-mem. I tried it, and it only find deletions that are maximum 20bp in 125bp reads under default parameters. Is it safe to tweak them, or do lump/delly take into account unmapped reads as well?
bwa-mem will produce two hits for split reads. At least lumpy uses this information. Giving two hits is more general. You can't put inversions or translocations in one sam line.
Ok fair point. I just successfully ran lumpy, and it identified 2 candidates over 10kb, one supported by 4 split reads, and one supported by 20 or so split reads and a few PE reads.
Is there any way to run it only on SE reads (as to only look for split reads)? My fragment length was ridiculously low, I ended up merging most of my reads and treating them as SE. I suppose I could trick the software by creating fake PE reads (reverse compliment my SE reads and call them R2), but I hate doing that.
Delly does support SE reads for InDel detection since version v0.7.1.