Hello, I am new to bioinformatics and am having trouble with an assembly. I have illumina miseq short pair-ended reads from an organism that has multiple yeast chromosomes and a single bacterial chromosome with portions of a plasmid inside. I have references for all three organisms.
I want to know how I can get the bacterial chromosome assembled entirely. I need it assembled entirely and then to annotate it, because my PI is not versed enough in sequencing information to receive any other format.
Is there a open source package that will allow me to assemble to a reference, when I expect it to not be exactly the same. Or is there a combine of packages I can use to do this? I have contigs and scaffolds from a spades assembly. I am not sure where to go from there.
What is your final goal? Is it genome assembly or something else. This question feels like an XY problem where you're asking for help with genome assembly when what you really something else. It's possible that there is a better way to answer your actual scientific question that doesn't involve genome assembly.
For example, if you a similar/ancestral bacteria has already been sequenced+annotated, you could perform variant calling against that to find the differences instead of genome assembly.
The absolute goal with this data is to determine the sequence of the Yeast Artificial Chromosome of the sample. The yeast artificial chromosome should be the bacterial chromosome with the genes of interest from the plasmid. That is why I have been focused on assembling the data. I was able to align the data to my references, however I was unable to create an alignment that showed how my plasmid had inserted into the bacterial chromosome. I am having trouble moving forward from here, but am going to give variant calling a try.
We had asked our sequencing facility for long reads, and were advised to begin with short reads. We were advised to pursue a hybrid assembly if short reads did not answer the question.
It is not essential for me to do whole genome assembly, my PI just wants the information regarding SNPs and to know where the genes from the plasmid have inserted into the chromosome. Is this something I can learn from doing variant calling from my alignment? I have an alignment from Bowtie2 and CLC, aligned separately to my bacterial chromosome, yeast chromosome and plasmid. I want my plasmid alignment and bacterial chromosome alignment to be combined.
This is my first experience with BioStars so I did not post my question with enough detail. Is there any information I can provide to resolve the XY issue.
If you already have a reference genome for the ancestral (.i.e. pre-plasmid) strain then yes you can. If you do then you can get away with variant calling. When I did this sort of analysis I ran a SNV caller, a SV caller, and a CNV caller. I created reference genome containing the ancestral strain + plasmid, aligned the reads with
bwa mem
, usedGRIDSS
to detect the breakpoints, then manually inspected and QCed the detected breakpoints with IGV (to ensure that a) the plasmid was inserted as intended, and b) there were no off-target structural changes). CNV calling was used to verify the #copies of the plasmid inserted (e.g. in case I had 3.5 tandemly duplicated copies of the plasmid inserted so while it looked like one of plasmid genes was only partially inserted, there were additional full-length copies in there as well) and that there were no losses elsewhere. Ideally, you sequenced the yeast before and after plasmid insertion so you can tell whether a (e.g. SNV) difference between the reference and your sample was already present in your unmodified strain.Alternatively, there may be tools & workflows that are targeted to your specific application. A quick literature search found this paper that looks very close to what you want to do (but does require short+long reads).