Question

Very Large Insertion Detection Methodology Query

1

Entering edit mode

11.1 years ago

rob234king ▴ 610

I've been given a project to find an plasmid insertion of a small genome that has been sequenced using Illumina that has a reference sequence available. The plasmids used are large 10kb.

I'm used to using Novoalign3 mapping and GATK to find SNPs and INDELs but I'm not sure that such a large INDEL will be detected using this method, I'll attempt but I was wondering if there is a more appropiate method to do this, I was thinking possibly de novo assembly and compare with reference using mummer. Any thoughts on the best method to detect plasmid insertions that are going to be 1-10kb?

gatk • 4.9k views

ADD COMMENT • link updated 11.1 years ago by Rohit ★ 1.5k • written 11.1 years ago by rob234king ▴ 610

Ram · Answer 1 · 2013-10-26

2

Entering edit mode

11.1 years ago

Pierre Lindenbaum 164k

Use bwa-mem to find the regions where the reads that have both a part mapping the plasmid and a part mapping the genome.

ADD COMMENT • link 11.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks for the quick response, not sure I understand this though. Do you mean map the reads to the plasmid sequence rather than the reference and then take the overhang sequences either side of the plasmid from those reads that mapped, join them and search for it in the reference?

ADD REPLY • link 11.1 years ago by rob234king ▴ 610

0

Entering edit mode

I meant, use both sequences (plasmid+genome) for your bwa reference. and bwa will tell you where some reads overlap a junction.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 11.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Ah - I replied assuming that the sequence of the plasmid was unknown Pierre. I'm sure Rob234 can clarify

ADD REPLY • link 11.1 years ago by zam.iqbal.genome ★ 1.9k

0

Entering edit mode

Thanks, still good to know that could be done without though. Yea I know what the sequence should be but the plasmid is placed in the sequence randomly. I'm not sure what is meant by overlap a junction? if I add the plasmid to the reference genome it's like separate contigs the mapper doesn't try to map across the two contigs? and the plasmid isn't attached to either of the ends it's inserted in it. Most likely don't understand how BWA-MEM is working, does it report reads using a special flag that can be split and mapped to both contigs (junctions)?

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 11.1 years ago by rob234king ▴ 610

1

Entering edit mode

If you are using paired end reads, then the mapper will tell you if one end maps to one chromosome and the other maps to the plasmid

ADD REPLY • link 11.1 years ago by zam.iqbal.genome ★ 1.9k

1

Entering edit mode

BWA will tell you in a SAM if ONE read maps two regions: the best hit is in the regular record (say chr1:12345 cigar:50M50S) and the 2nd hit in the metadata (plasmid:6789 cigar:50S50M )

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 11.1 years ago by Pierre Lindenbaum 164k

score 1 · Answer 2 · 2013-10-26

Have a go with Cortex:

webpage http://cortexassembler.sourceforge.net/index_cortex_var.html docs: http://cortexassembler.sourceforge.net/cortex_var_user_manual.pdf Papers: - on microbes: http://bioinformatics.oxfordjournals.org/content/29/2/275.full.pdf+html - the original paper: http://www.nature.com/ng/journal/v44/n2/full/ng.1028.html

I've used it to to look at plasmids before. Use run_calls (described in the manual) to automatically assemble and error-clean, and then you can 1. try the "Bubble Caller". 2. If that fails, dump contigs using --output_supernodes, and see if any of them are plasmids. Once you identify a plasmid contig, add that to your reference, and then remap your reads.

score 0 · Answer 3 · 2013-10-27

Why don't you try out Segemehl http://www.bioinf.uni-leipzig.de/Software/segemehl/

I don't mean to add another one to the long list of available software for read mapping. But with my personal experience, Segemehl has worked quite well for detecting the splicing with high selectivity. Sensitivity of the tool is good. But I've no data regarding the largest gap it can recognize while mapping to a reference.