Hi everyone,
I've been looking around and I can't seem to find a good solution to my problem, so I will ask here, in hopes that you may have some insight (of course, if there's a better thread that I've missed, please point me in that direction).
I have sequences for a few different strains of bacteria that I've manipulated in the lab, and each of these strains has a single plasmid. I've sequenced the whole genome of each strain, plasmid and everything. I didn't have an issue with this part - I trimmed the files with Trimmomatic, and aligned my sequences to a known reference for the original bacterial strain (no plasmid) using bowtie2, and asked for an output file of the reads that did not match the reference (so, hopefully all of my plasmid reads).
From there, I've done a de novo assembly using SPAdes, which worked well, but my problem is this: after assembly, I have many contigs/scaffolds in the respective files output by SPAdes (~1000+). I used BLAST to determine the likely origin of the sequences and have discovered that there is a pretty good mix of both short and longer leftover chromosomal sequences (which I don't want) mixed in with longer contigs that BLAST to plasmids. I'm only interested in assembling and characterizing my plasmids, but I'm not sure if there is a good way to remove the chromosomal sequences while keeping the plasmid ones, which I hope to then use in further steps to possibly get a whole map of the plasmids. I naively thought that I could go through and simply remove each of the sequences that I didn't want by deleting them manually in a word processor, but there has to be a better way. Using BLAST, I know exactly which contigs/scaffolds have plasmid sequences (or what are thought to be), but I just don't know how to filter out the rest of the sequences that I don't need.
I'd really appreciate any suggestions about what to do from here. I'm still fairly new to bioinformatics, so I'm not entirely sure what other types of software/tools are available in this situation.
Many thanks,
Amanda
Hi Brian,
Thank you very much for the suggestion - this sounds like it'll solve my problem, so I will try this out!
Amanda
Edit: I've just tried it out and it works great! Thanks again!