When conducting a Large Y2H screen to find potential interactions for the HIV-1 MA protein. A number of the clone sequences identified cannot be ascribed to a protein using preliminary BLAST. Do any methods exist which can shed light on these potential hits?
There is not enough information in your question to know exactly what you have done, but my guess is that you have probably used the 'blastx' query (Here), since you are looking for a protein.
If you have not done so already, you may want to try 'tblastn' and also a simple nucleotide blast from the same ncbi blast page. Both could give you results that would help you find the protein that these sequences are linked to. Beware that you can also blast different databases, which will give you different output. This may also help you finding the appropriate protein name for your sequences.
Please do update your question to put in information about why you are doing this project, what are you trying to accomplish, what you have tried so far. The more specific information you put in, the more likely you will get a useful answer.
The questioner writes: Eric, Yes the initial approach for searching sequences is blastx I will try tblastn, however which exact databases do you want me to search on? Could you also elaborate how using other databases could aid me?
I am only trying to help someone associate proteins to 5p and 3p sequence fragments, what is intended to be done with these proteins is beyond my scope, I also cannot reveal data here.
First thing to do is to check if there is a 'real' protein-coding region fused in-frame to the prey-specific DBD or TAD. If there is something (and it is longer than a few amino acids) it should be good for BLAST searches (assuming that you are screening for human proteins).
If nothing is found, my best bet is that there is no meaningful in-frame fusion.
Make sure that your BLAST searches include the vector. If there is some material fused in-frame and it is not derived from the vector (and yet does not match the proteome), perform a BLAST search against the genome (you can use the ENSEMBL server http://www.ensembl.org/Multi/blastview for that purpose). If you still don't find anything, there is something wrong with your library :-)
Edit: In response to your response, here is the extended version of my comment:
I don't see how that fact that this is someone elses project impinges on the analysis - i assume that you have the sequences of the prey constructs and take it from there. I stand by my suggestions: first identify the sequence of the TAD/DBD part of the prey construct and then see if there is a protein fused in-frame. If there is not, you can discard this clone (or use it for troubleshooting). If there is a fused protein, use this sequence (remove the DBD/TAD first) for blastp searches in a protein database. Only if this fails, use the the cDNA sequence of the fused construct (remove DBD/TAD first) and run this as a blastn against a genome database (I recommend Ensembl). If this still doesn't give you anything, there is something wrong with the clone (or the entire library). Make sure to also compare the prey construct with the empty vector sequence used for the construction (this information is available from the provider or elsewhere in the internet)
As to your question for expected hits: this depends on a number of things (were the experiments done properly or was the bait expressed at exaggerated levels? Is the bait truly interacting with something? Does your prey library contain the interaction partner). In a good Y2H experiments, you expect below 10% artefacts (empty prey clones etc) and you expect 5-50 different hits. It is always a good sign if you identify a certain prey protein multiple times from independent prey clones (different construct borders)
The questioner writes: Lyco, The data I have is from someone else's experiment.
However, this raises another questions. Usually what % of sequences from a y2h screen get ascribed to proteins? What is the usual number of positive hits for any given experiment? (10s 100s 1000s?)
Lyco replied: I don't see how that fact that this is someone elses project impinges on the analysis - i assume that you will have the sequences of the prey constructs and take it from there. My full reply is too long for a comment, so I will edit my original answer to included the extended version of this comment. See above (or below?). MOVED their comment here.
what was kind of library did you use for the pray ?
what was kind of library did you use for the prey ?
The questioner writes: Pierre, I believe its a cDNA library from human origin, but I am not 100%% about this.
Moved their text here from their own (inappropriate) answer.