Hello, I have around 400 protein sequences which have no sequence similarity, no identifiable protein domains and no identifiable motifs. What steps should I take in order to characterize these proteins, both in function and strucure? My initial thoughts were as follows:
1) Compute physiochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatix index, and grand average of hydropathicity.
2) Predict secondary structure
3) Determine subcellular localization
4) Ab initio modelling
Apart from these methods, what else could I employ to broaden the analysis of these sequences?
You said no sequence similarity. Did you use BLAST to do your search? Which organism is this? And how much similarity was there?
Hi Jordan, yes these are protein sequences from the salamander species N. viridescens, the transcriptome of which has only recently been produced. There were around 600 protein-coding transcripts that did not show any hits in the NCBI databases (BLAST searches) and around 300 which showed hits to urodeles only.
Just put the 600 "orphan" ORFs up somewhere (figshare/Github) and we can have a fun competition to help you out - seriously