Concatenating fasta files for BLAST
1
0
Entering edit mode
10.5 years ago

I have two fasta files from WormBase. One is the coding transcripts, the other is ncRNA. When I BLAST (using NCBI Blast+) 44k 60-mers against just the transcripts, I get 37,216 hits. When I concatenate the ncRNA fasta file to the transcripts file (and do makeblastdb again) I get 33,572. I know that several of these 60-mers hit both the transcripts and the ncRNA so I need to BLAST them together. How do I do that without losing ~4k hits?

blast • 2.8k views
ADD COMMENT
1
Entering edit mode
10.5 years ago

I also asked NCBI who wrote: "By doing the concatenation, you increase the search space the expect value of the hit, which may fall below the cutoff and thus dropped from the report"

ADD COMMENT
0
Entering edit mode

For short matches such as these you will need to increase the E-value cut-off, and possibly make the search more stringent by increasing the gap costs and the match/mismatch scores.

Note that switching to an alternative algorithm may also help in this case, I suggest looking at the FASTA suite programs (see here), in particular GLSEARCH and FASTM may be of interest.

Alternatively if you are not interested in statistics, but only want to know if a match does or does not occur, pattern matching methods such as those provided by the EMBOSS programs dreg and fuzznuc may be more appropriate.

ADD REPLY

Login before adding your answer.

Traffic: 1814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6