Blastn - unexpected behaviour
1
1
Entering edit mode
5.3 years ago
bvm ▴ 20

I ran into an unexpected feature of blastn. After extracting some gene sequences from a genome, creating a blast database and blasting back to the reference, lot of extracted genes are not found in the blast result, while they are certainly there in the genome (as they were extracted from there) What can be the cause? Some details (I cannot upload the whole files):

My command is:

blastn -query GCF_000005845.2_ASM584v2_genomic.fna -db MG1655_genes -outfmt 6

The fasta file is downloaded from https://www.ncbi.nlm.nih.gov/genome/167?genome_assembly_id=161521.

The database is gained from extracting the feature table belonging to the assembly above.

A missing gene from the blast is e.g. aaaD. However, if blasting only this gene, it is naturally found.

blastn • 947 views
ADD COMMENT
1
Entering edit mode

Think you should blast the genes against the genome. So you index the genome first and then blast your genes in fasta format against it. This makes more sense and you can get the location of the gene.

ADD REPLY
1
Entering edit mode

Using a genome as search query against a list of genes is probably not a great idea (unless you have a specific reason for it). Have you considered doing the search in reverse?

Also trying adding -task blastn to your command line to see if it makes a difference. Default is megablast.

ADD REPLY
0
Entering edit mode

I thought of this approach because the goal is to find the same genes in a lot of genomes, but you're right to do it in the other way - I'll use the set of genes as query and the specific genomes as subjects.

If adding -task blastn there was a difference, but still not all genes occurred.

ADD REPLY
2
Entering edit mode
5.3 years ago
bvm ▴ 20

After doing some research, I found the answer for my question. The value for max_target_seqs is 500 by default. If raising max_target_seqs to some irrationally high value, all genes are shown.

Hence I used

blastn -query GCF_000005845.2_ASM584v2_genomic.fna -db MG1655_genes -outfmt 6 -max_target_seqs 100000000

to obtain all genes.

ADD COMMENT

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6