We observed that a PCR band that has been seen in a PCR gel is not being catched by DNA sequencing.
Here is the photo of the gel. basically, we are trying to see if a mutant has or not a 1KB operon that is trying to be truncated. Same primers are amplifying two regions, a 4KB band that is the plasmid inserted region, and a 1KB that is the a 1KB region before plasmid insertion.
What I tryed so far was to see if the 1KB region is in the mutant by DNA sequencing of mutant genome followed by
genome assembly. I tried two assemblers (SPAdes and A5 pipeline) and none of them is catching full operon, just pieces of it after plasmid transformation. How is that possible? PCR artefact?
The alternative approach to check if 1KB region is present is the one proposed here A: SPAdes alternative for double check .
Pier proposes mapping reads to the sequence the 1 kb region. I think is not good strategy since full 1KB operon is present as raw data but not in a continuous way (as revelealed by assemblers).
I'd re-run that gel with the same amount of DNA loaded in each well for a start :)
Is this plasmid supposed to integrate in the exact same place in the genome for every cell sequenced? If so, can't you use a known bit of DNA from the inserted fragment to sequence out of the insert (to see where it integrated, if that's all you're interested in). To be honest I had difficulty understanding the question so maybe i'm not going to be much help anyway
You may need to sanger sequence those two bands to confirm that there is a relationship/homology. If the amplification turns out to be off-target then ...
Maybe your 1 kb region is a genomic island which is not present in all members of the species? I would recommend to map your reads to the sequence of this 1 kb region with 'bwa mem'. See also how to locate a gene sequence among fastq files containing short reads (genomax2 would opt for bbmap).
If you get very low coverage, the region is absent at all. If you see very high coverage, the region is a repeat and Spades has not assembled across it for good reasons.
Can the plasmid be maintained as an episome without integration? If so, then the problem may be due to differences in copy number between the plasmid and genome. Assemblers like Spades take read depth into account during assembly. Conceptually, an episome+integrated plasmid is the same as a repetitive element, and treated as such (i.e., assembled independently of the flanking sequences).
Check your coverage data for the plasmid contig (which should be present in your assembly) compared to the genomic contigs to see if that is the source of your problem.
If true, then the 1kb sequence should be flanking the plasmid sequence. Is it not? If not, then what sequences flank the plasmid in your assembly? A diagram would help to clarify the issue.
I don't have plasmid contig, assembly reveals that plasmid is integrated in the genome.
Do you find a complete plasmid integrated into the chromosome, or only a cluster of genes which you think is usually plasmid born?
Since you spend so much effort on this plasmid, it must comprise something very interesting. Antibiotic resistance genes often reside on plasmids. Unfortunately, they are often parts of integrons (repeat cassettes) or accompanied by repeat elements such as transposase genes or their remnants. These elements may be found at several sites of the plasmid but also on the chromosome. These repeats are a challenge for assembly, and may give rise to wrong contigs. You should evaluate the read coverage very carefully, especially at the putative insertion site of your plasmidic genes into the chromosomal contig.
The 4kb plasmid region is interesting but I think knowing all details about what's there is not essential to my goal. I evaluated manually this region and seems to be the expected integrated plasmid. In any case, what I want to know is if there is a 1kb full operon in my assembled mutant data, and I can't find it, neither inside genome or in a small 1kb contig. I should expect at least this 1kb contig in case episome phenomena is happening, right?
knowing all details about what's there is not essential to my goal
it is very essential to know if there are any repeats in that 4 kb region. With repeats present the assembly of short reads may become unreliable and you may also see strange concatemers in PCR.
Have you tried Velvet?
I'd re-run that gel with the same amount of DNA loaded in each well for a start :)
Is this plasmid supposed to integrate in the exact same place in the genome for every cell sequenced? If so, can't you use a known bit of DNA from the inserted fragment to sequence out of the insert (to see where it integrated, if that's all you're interested in). To be honest I had difficulty understanding the question so maybe i'm not going to be much help anyway
You may need to sanger sequence those two bands to confirm that there is a relationship/homology. If the amplification turns out to be off-target then ...