multiple sequence alignment for validation of large indels
1
0
Entering edit mode
7.9 years ago
thomas.welch ▴ 50

Hi there,

I will soon have 180 illumina paired end sequenced genomes (approx 13x) with which I am conducting a GWAS experiment, these genomes are approximately 36Mbp long and have very long segmental indels (up 770kb). I intend to genotype my samples for these indels as a separate analysis from the GWAS using CNVnator (with a reference genome) to detect and visualise the indels, and I would then like to validate these genotypes for each individual using multiple sequence alignment. Is there a software or package capable of this final step, that can give me an easy to work with output?

Could I perhaps use the BEDtools coverage command for this?

multiple-sequence-alignment cnvnator indels • 1.7k views
ADD COMMENT
0
Entering edit mode
7.9 years ago

This is an indirect answer, but when mapping the reads, if you want to capture long deletion events within reads so that they can be called by standard variant callers (rather than CNV/SV programs), I suggest you map with BBMap, like this:

(once, for indexing)

bbmap.sh ref=ref.fa k=14

(for each sample)

bbmap.sh in1=read1.fq in2=read2.fq out=mapped.sam k=14 maxindel=770000 minratio=0.4 bs=bs.sh; sh bs.sh

The actual maximum length of indels that can be detected this way is related to the read length. For insertions, the maximum length is around 50% of the read length. 150 bp reads should be sufficient for a 770kbp deletion. once you map the reads like this, you can visualize the long indels in IGV from the sorted indexed bam file (created by the "bs.sh" shell script, if samtools is installed). You can subsequently call the events like this:

callvariants.sh in=mapped.sam ref=ref.fa vcf=vars.vcf ploidy=1

You can increase read length, if needed, with BBMap's Tadpole or BBMerge, like this:

tadpole.sh in1=r1.fq in2=r2.fq out1=e1.fq out2=e2.fq mode=extend extendleft=50 extendright=50

or, for paired reads with a long insert size:

bbmerge.sh in1=r1.fq in2=r2.fq out=merged.fq outu=unmerged.fq rem extend2=50

These require sufficient coverage, though, because kmers from other reads are used for extension. 13x is barely sufficient, but should work.

ADD COMMENT

Login before adding your answer.

Traffic: 2089 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6