wget ftp://ftp.ncbi.nih.gov/genomes/Homo_sapiens/protein/protein.fa.gz
wget ftp://ftp.ncbi.nih.gov/genomes/Sus_scrofa/protein/protein.fa.gz
wget ftp://ftp.ncbi.nih.gov/genomes/Gorilla_gorilla/protein/protein.fa.gz
sed 's/>/>pig_/' pig proteins > pig
sed 's/>/>gor_/' gorilla proteins > gor
cat pig gor > piggor.fa
makeblastdb -in piggor.fa -dbtype prot -hash_index -parse_seqids -out piggor
subsample 1,000 human proteins
blastp human subsample against piggor db with tabular output (outfmt 6)
sort for best hits with
sort -k1,1 -k12,12gr -k11,11g blast_output | sort -u -k1,1 --merge > best_hits
grep -c 'pig_gi' best_hits
grep -c 'gor_gi' best_hits
About 10% of best hits are against pig. Why?
You may want to show a few examples, as I doubt someone wants to reproduce this without results.
It takes like 5-10 min with a modern laptop. Sub-sampling just 100 human proteins gave me two times pretty much the same outcome (with and without -use_sw_tback).