Hi,
I'd like to find all possible hits for a certain NCBI conserved domain (which is expressed as a PSSM) in a chromosome or scaffold of a certain genome. I know that the common way to search for conserved domain hits in a nucleotide sequence is to translate it into 6 ORFs and RPS-BLAST each one of them against the Conserved Domains Database (CDD). But this is reasonable for short nucleotide query sequences and not a chromosome length.
I'm thinking it might have already been done by NCBI, in which case my question would be if anyone knows which files have that info. But if I have to do that, I'm guessing some variation of a TBLASTN (where the query is the PSSM) will do. In that case any suggestion how to achieve that?
Thanks
Seems that
rpstblastn
handles queries up to 200KB length, so I think that I'll need to PSI-BLAST the chromosomes/scaffolds with the CDD profilesNot sure where you came up with this 200 Kb limit - maybe when submitting to the CD-search web site?
Standalone
rpstblastn
has no such limit. I just searched using a 2.8 Mb bacterial genome and it went fine.