I have two similar genomes (one "known", one "unknown"). The "known" genome has already been determined to contain a plasmid. Both Prokka and plasmidspades have unexpectedly shown no results for plasmids in either genomes. In the case of plasmidspades, I ran the following commands:
plasmidspades.py - 9,21,33,55,77,127 --careful -1 read1.fastq -2 read2.fastq
plasmidspades.py - 9,21,33,55,77,127 -1 read1.fastq -2 read2.fastq
In both cases, there was no contigs.fasta in the main output directory, the assembly_graph.fastg in the main output directory was empty, the warnings.log in the main output directory said "No putative plasmid contigs found!", and inside the directories (such as K127), the final_contigs.fasta was empty. Unless I should check elsewhere in the Prokka outputs, I interpret this to mean no plasmids were found.
So, I then downloaded the plasmid sequence fasta file on GenBank (that had already been discovered for the "known" genome). Using BWA, I aligned the raw reads from the genomes to this plasmid reference fasta file. In the "unknown" genome, I see 180 reads mapping to the plasmid reference fasta file. I believe this is evidence that the "unknown" genome contains a plasmid (even if undetected by Prokka and plasmidspades).
*My question is - *
What approach should I take to solidify the evidence for the presence/absence of plasmid(s) in this "unknown" genome? If there is a plasmid present, how can I determine its sequence with confidence? (Side note: For the "unknown" genome, I have the raw reads, the assembled contigs from running SPAdes, and the reordered contigs from running MAUVE against the "known" genome as a reference).
I would not consider 180 reads solid evidence for presence of plasmid (what kind of data do you have BTW, short or long reads?). How many total reads are in this dataset? Have you checked the alignments to see if they look solid (not many mismatches) and which region of the known plasmid the reads are aligning to?
Total reads are about 480,610. Reads are about 150 base pairs. I do not have much experience at all with this subject and am unsure how to even examine which region of the known plasmid the reads are aligning to. I simply ran code I found online as follows: bwa index plasmid.fasta; bwa mem plasmid.fasta mySample_r1.fastq mySample_r2.fastq > mySample.sam; samtools view -c mySample.sam (total number of reads); samtools view -cF4 mySample.sam (total number of reads in my sample that mapped to plasmid). This is where I am stuck, trying to visualize how these 180 reads align to the plasmid.
You can use IGV to visualize your aligned data. You will need to create a custom genome with your plasmid reference. Follow the directions for custom genomes here. You will need to sort and index your alignment using samtools.
180 map to the plasmid? That's barely nothing (I assume Illumina reads since you used spades), you won't be able to assemble the plasmid with it. How is the genome assembly? Can you compare it to the known genome? If it contains the same plasmid you will be able to find it in the assembly regardless of the assembly method.
I compared the genome assembly to the "known" genome using MAUVE. They were quite similar (only about 20 LCBs at the default settings). In terms of your last sentence, how can I find the plasmid in the assembly? What tools should I use? This may be a simple question but I am new to this type of analysis.
I meant by comparing the assembly with the reference genome, do you see a match between the assembly and the plasmid?
or
Use fastqscreen to check for aligned reads with reference and plasmid
Fastqscreen allows screening against 100000 reads as default from input read files. @ clapin
How was the DNA for the sequencing run prepped?
I agree with the others, 180 reads is not that much (since no doubt at least a few of those reads will be junk) and certainly not enough to get an assembly.