Hi!
I have a list of bacterial genomes, and I want to find such common proteins from them, which can be found in ALL organisms (common would mean they match for at least 60%). What's the best way to do it?
I'm using biopython with standalone blast+
P.S. Any suggestions would be great, because I was trying to do with a series of local databases, but it didn't turn up as a success
cd-hit
is great, thanks!can you help me out with output though?
I'm doing command
and having an output like
...and so on
this means that protein
>gi|156742169|ref|
is fromdb1
and similar to>gi|148655222|ref|
fromdb2
on 84%, right?and that next two lines (claster 1 and claster 2) contains proteins that doesn't have similaryties?
I'm a little confused with these clusters
also, why when I test this prog on 2 files, each consisting of one fake protein
the result is empyness, and should be 90% match?
You're correct about the output. In my example I used sed to modify all headers so that it would be easier to interpret the clusters with members from different species. I don't know about your fake input, maybe it's due to to short peptides or the fact that B is not a valid letter..
ok, thanks a lot!