I'd like to preface this post by saying I do not have much experience in structural or molecular biology.
I'm interested in implementing alphafold2 (or RoseTTAFold) to generate high-confidence protein folding structures for a database of ~200,000 viral proteins, approximately 100 of which are unique. The proteins are generally short and don't exceed 1,000 AA. Running alphafold2 comprehensively for each protein is not possible. Ideally, I would like to limit database (both genetic and structural) searching as much as possible. I am considering the following protocol:
1). Run alphafold2 on a single reference protein. Retain database search output. 2). Identify other proteins within ~2% genetic similarity to reference (note - some proteins can be wildly diverse) 3). Use the reference database search output to jump-start processing proteins from step 2
The remainder of processing would be the same (and likely prohibitively expensive in time/resources). Is there a more efficient way to meaningfully predict protein folding (ie output that a structural or molecular biologist might find useful)?
Are you affiliated with this company?
Are you working in a coordinated manner with Sherry to promote Tamarind Bio?