Hello there,
I am trying to pull out homologous genes from contigs/scaffolds that I have assembled from a metagenomics dataset. I have already constructed a fasta file of the gene family that I am interested in from Uniprot. I have read that performing PSI-BLAST with a low e-value can be used to pull out homologous genes from the contigs/scaffolds using the reference gene family database. However, PSI-BLAST is really slow as my contig file is quite large.
Can anyone recommend a software that could be used to achieve the following aims above?
I have another question: is there a need to perform gene prediction after assembly, prior to the homologous search?
Thank you.
You should predict/annotate your genes first IMO. How will you know which 'genes' to compare otherwise?
You may want to take a look at (e.g.) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526259/
Generally searching for COG (clusters of orthologues) identification strategies should turn up a good amount of options.
Dear @jrj.healey,
Thanks for the reply and recommendation. Pan-genome ortholog clustering tool (PanOCT) does not seem to be used for metagenomics?
That isn't its specific use-case, but defining a pan genome/core genome is conceptually very similar in procedure - it is still a clustering process for likely orthologues.
That was just the first example I found anyway (and isn't one I've heard of or used) so probably isn't the exact tool to use, but if you continue to google for COGs that will help you find things.