Check if blastn gene matches are of entire genes or of parts of genes
1
0
Entering edit mode
17 months ago
langziv ▴ 70

Hello,
I run blastn of specific bacterial genes against Klebsiella's plasmids' sequences.
I need to check if the matches are of entire gene sequences, in order to filter out partial gene matches.
The first thing I thought of was using blastx, but it's been a while since I last did such an analysis and I wish to check if there's a better recent approach. What do you think about blastx as the next step?
Another approach I thought of was comparing the query genes' lengths with the matches' lengths, but that way, in case of partial length matches, I don't know if the genes on the plasmids are functional or not.
Thanks.

blastx blastn bacteria • 1.6k views
ADD COMMENT
0
Entering edit mode

What exactly is your goal? Do you want to find genes in plasmid sequences that you sequenced? There are tools to do that (e.g. prokka)

ADD REPLY
0
Entering edit mode

I'm looking for specific genes in all plasmids of a changing bacteria species. Now I'm working on Klebsiella.
Can such tools be used only for specific genes?

ADD REPLY
1
Entering edit mode

You can run Prokka and get the full list of genes in the plasmid, then look for your gene of interest

ADD REPLY
0
Entering edit mode

What do you think about using blastx?

ADD REPLY
0
Entering edit mode

Can you clarify what type of sequence is in your query? Fasta/fastq/full length?

ADD REPLY
0
Entering edit mode

Sure.
The queries are fasta files of specific genes.

ADD REPLY
1
Entering edit mode

If you know where the genes are on your plasmids then you could use BLAT perhaps even minimap2 to look for end-to-end hits.

ADD REPLY
0
Entering edit mode

What do you think about using blastx?

ADD REPLY
0
Entering edit mode

Blastx cannot be used because it uses a protein database. You could use tblastx in case the sequences are very divergent. However, if you get good hits with blastn already then you probably don't need this.

ADD REPLY
0
Entering edit mode

Please clarify if you know what plasmids you are working with and also know what genes they have. If you know this information then it would simplify things to a large extent.

ADD REPLY
0
Entering edit mode

I know what plasmids I'm working with. I have their fasta and GenBank files.

ADD REPLY
1
Entering edit mode
17 months ago
Asaf 10k

What you should want to do is predict open reading frames in the plasmid sequence and then compare them, with global alignment, to the gene or genes of interest. You would prefer this path since you want to check if the proteins encoded are the same, if you just run a local alignment tool like BLAST you might end up clipping the ends of the protein if the match is poor there.

The simplest way to do this (as it is done for more than 20 years) would be using tools from EMBOSS : getorf and needle. You can either install it locally or run on one of the publicly available servers like this.

More automated and up to date tools to consider are prokka that will give you the entire annotation of your plasmid, then you can just look for the gene of interest by homology group or sequence.

ADD COMMENT

Login before adding your answer.

Traffic: 1821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6