Hi all, I'm running about 50 proteins of a particular species (sub_proteins.fa) against all proteins for that specie (proteome.fa) using hmmer. I have done this using phmmer with the following command:
phmmer --tblout results.table sub_proteins.fa proteome.fa
However, I noticed that in the resulting file (results.table) there are multiple lines where the same protein is being compared to itself (I guess because the 50 query proteins are also found in the proteome file). See below for an example:
# target name accession query name accession E-value score bias E-value score bias exp reg clu ov env dom rep inc description of target
#------------------- ---------- -------------------- ---------- --------- ------ ----- --------- ------ ----- --- --- --- --- --- --- --- ---
YLL_767 - YLL_767 - 8.1e-177 580.6 1.4 1e-175 582.3 1.4 1.0 1 0 0 1 1 1 1
is there anyway to prevent the hmmer programs from doing this? The reason I am concerned about this is because after this step I am going to be combining the query and significant targets together into a model that will then be used to search against other species proteomes...will having redundant proteins affect my results?
Any help is greatly appreciated!
Thank you for this very thorough and helpful explanation! this is exactly what I was wondering about.