Searching for conserved domains in a chromosome/scaffold
1
0
Entering edit mode
3.7 years ago
rubic ▴ 270

Hi,

I'd like to find all possible hits for a certain NCBI conserved domain (which is expressed as a PSSM) in a chromosome or scaffold of a certain genome. I know that the common way to search for conserved domain hits in a nucleotide sequence is to translate it into 6 ORFs and RPS-BLAST each one of them against the Conserved Domains Database (CDD). But this is reasonable for short nucleotide query sequences and not a chromosome length.

I'm thinking it might have already been done by NCBI, in which case my question would be if anyone knows which files have that info. But if I have to do that, I'm guessing some variation of a TBLASTN (where the query is the PSSM) will do. In that case any suggestion how to achieve that?

Thanks

blast domain CDD PSSM rpsblast • 1.3k views
ADD COMMENT
1
Entering edit mode
3.7 years ago
Mensur Dlakic ★ 28k

BLAST+ suite has rpstblastn which is a combination of rps-blast and tblastn. That means it uses a DNA query, translates it in all reading frames, and compares predicted ORFs to a profile.

If you go to the CD-search web site, either protein or DNA sequence is acceptable as input.

ADD COMMENT
0
Entering edit mode

Seems that rpstblastn handles queries up to 200KB length, so I think that I'll need to PSI-BLAST the chromosomes/scaffolds with the CDD profiles

ADD REPLY
0
Entering edit mode

Not sure where you came up with this 200 Kb limit - maybe when submitting to the CD-search web site?

Standalone rpstblastn has no such limit. I just searched using a 2.8 Mb bacterial genome and it went fine.

ADD REPLY

Login before adding your answer.

Traffic: 1215 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6