Hey guys,
Something that should be so simple has been so difficult for me to resolve the past few weeks. I am using pHMMER to search my ~12,000 fungal gene predictions against the MEROPS database. The issue is that for some of my gene queries, over a 1,000 hits will be returned, making it impossible to sort through all queries for their top hit. There are so many hits that not all of them can fit in one excel spreadsheet.
**Edit Here is a subset of what my table looks like. As you scroll down you can see that for just the first gene there are almost 1,000 hits.
Well after contacting the developer, he suggested redirecting the main output to /dev/null so that only the top hit of each query remains. He said the script should look like this
phmmer --tblout 1371E_merops5.tbl /work/Geomicrobiology/msobol/IODP_329_SPG/1371E14H2/maker/1371E_uni_snap.maker.output/1371E_uni_snap.renamed.maker.proteins.fasta /work/Geomicrobiology/msobol/databases/pepunit.lib > /dev/null; head -4 1371E_merops5.tbl
However, this still does not work, and the developer says it has to do with the command line, not the program. Does anyone here have experience with this???
Thanks in advance! Morgan
What exactly are you left with? What does your
*.tbl
file look like before and afterhead -4
?Its not clear if you're outputting the result to the
.tbl
or to the STDOUT (or both)?I am just directing the output to .tbl only using --tblout
Before
After
Has anyone solved this issue. I am also getting multiple hits for the same id instead of directing for a single best hit. Now I have results in tbl format, want to keep only the best/top hit and delete other hits from the file. Please share your suggestions/script for the same.