Question

Presence/absence of proteins blastp nr database

0

Entering edit mode

7.4 years ago

freekwilly • 0

Hi all,

I am a molecular biologist in the first place, but came into touch with bioinformatic challenges more often recently. For simple tasks I've been using command line blast+ already and now I am wondering whether it is possible to do the following with blast+:

Let's say I have a random protein sequence in a file named RP1.faa and I want to check each bacterial strain in the refseq nr database on whether RP1 ist present or absent by an e-value threshold of e.g. 1e-05. And I want a table which shows me only the best hit for each strain in the first column and e.g. e-value, bit-score and so on in the following columns. And in each row I want the next strain to be shown, even if no blast hit was found In the end I am imagining a matrix which shows me whether a certain protein is present or absent.

Do you know if it is possible? Thanks a lot!

blast genome alignment • 2.0k views

ADD COMMENT • link 7.4 years ago by freekwilly • 0

1

Entering edit mode

You can show the presence of each straing by, doing a tab output with -outfmt 6, and provide the e-value threshold to the blast command, as wells as use -max_target_seqs 1 to show the best alignment for each strain. What you will have to do is, conclude how against many/which strains RP1 is absent. Someone may have a better approach.

ADD REPLY • link 7.4 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

In addition you will have to filter/restrict the results for bacteria (in theory, taxid:2).

ADD REPLY • link 7.4 years ago by GenoMax 148k

0

Entering edit mode

I am trying to obtain a similar output but using the rpsblast function in the command line. Do you know if there is a way to not only display just the best hit but also only "specific hits" as is given as a result in the web-based version?

ADD REPLY • link 7.4 years ago by lauren.manck ▴ 20

0

Entering edit mode

Hey, thanks for the fast answers. But unfortunately it is not yet what I was looking for. When using -max_target_seqs 1 I'll get only one hit in total, not one hit for every entry in my database.

ADD REPLY • link 7.4 years ago by freekwilly • 0

0

Entering edit mode

not one hit for every entry in my database

Is the reference to nr or to your own sequence data? You can always flip the search around. It is not necessary that there will be a hit found for every sequence being searched in both of those cases.

ADD REPLY • link 7.4 years ago by GenoMax 148k

0

Entering edit mode

For now it still refers to the "nr" database. I'm am currently trying to subset it into a more precise and useful database

ADD REPLY • link 7.4 years ago by freekwilly • 0