Question

multiple sequence alignment for validation of large indels

0

Entering edit mode

7.9 years ago

thomas.welch ▴ 50

Hi there,

I will soon have 180 illumina paired end sequenced genomes (approx 13x) with which I am conducting a GWAS experiment, these genomes are approximately 36Mbp long and have very long segmental indels (up 770kb). I intend to genotype my samples for these indels as a separate analysis from the GWAS using CNVnator (with a reference genome) to detect and visualise the indels, and I would then like to validate these genotypes for each individual using multiple sequence alignment. Is there a software or package capable of this final step, that can give me an easy to work with output?

Could I perhaps use the BEDtools coverage command for this?

multiple-sequence-alignment cnvnator indels • 1.7k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 7.9 years ago by thomas.welch ▴ 50

score 0 · Answer 1 · 2017-02-14

This is an indirect answer, but when mapping the reads, if you want to capture long deletion events within reads so that they can be called by standard variant callers (rather than CNV/SV programs), I suggest you map with BBMap, like this:

(once, for indexing)

bbmap.sh ref=ref.fa k=14

(for each sample)

bbmap.sh in1=read1.fq in2=read2.fq out=mapped.sam k=14 maxindel=770000 minratio=0.4 bs=bs.sh; sh bs.sh

The actual maximum length of indels that can be detected this way is related to the read length. For insertions, the maximum length is around 50% of the read length. 150 bp reads should be sufficient for a 770kbp deletion. once you map the reads like this, you can visualize the long indels in IGV from the sorted indexed bam file (created by the "bs.sh" shell script, if samtools is installed). You can subsequently call the events like this:

callvariants.sh in=mapped.sam ref=ref.fa vcf=vars.vcf ploidy=1

You can increase read length, if needed, with BBMap's Tadpole or BBMerge, like this:

tadpole.sh in1=r1.fq in2=r2.fq out1=e1.fq out2=e2.fq mode=extend extendleft=50 extendright=50

or, for paired reads with a long insert size:

bbmerge.sh in1=r1.fq in2=r2.fq out=merged.fq outu=unmerged.fq rem extend2=50

These require sufficient coverage, though, because kmers from other reads are used for extension. 13x is barely sufficient, but should work.