Hello,
I run blastn of specific bacterial genes against Klebsiella's plasmids' sequences.
I need to check if the matches are of entire gene sequences, in order to filter out partial gene matches.
The first thing I thought of was using blastx, but it's been a while since I last did such an analysis and I wish to check if there's a better recent approach. What do you think about blastx as the next step?
Another approach I thought of was comparing the query genes' lengths with the matches' lengths, but that way, in case of partial length matches, I don't know if the genes on the plasmids are functional or not.
Thanks.
What exactly is your goal? Do you want to find genes in plasmid sequences that you sequenced? There are tools to do that (e.g. prokka)
I'm looking for specific genes in all plasmids of a changing bacteria species. Now I'm working on Klebsiella.
Can such tools be used only for specific genes?
You can run Prokka and get the full list of genes in the plasmid, then look for your gene of interest
What do you think about using blastx?
Can you clarify what type of sequence is in your query? Fasta/fastq/full length?
Sure.
The queries are fasta files of specific genes.
If you know where the genes are on your plasmids then you could use BLAT perhaps even
minimap2
to look for end-to-end hits.What do you think about using blastx?
Blastx cannot be used because it uses a protein database. You could use tblastx in case the sequences are very divergent. However, if you get good hits with blastn already then you probably don't need this.
Please clarify if you know what plasmids you are working with and also know what genes they have. If you know this information then it would simplify things to a large extent.
I know what plasmids I'm working with. I have their fasta and GenBank files.