Hello,
I need to generate a sequence similarity network from proteins with a similar domain architecture (Ex. Find all sequences with 4 DNA binding domains at the N-terminal region). While I can find all the sequences with similar domain architectures on Interpro or PFAM, I am having trouble linking that to their annotations in a systematic/automated way. Considering there are thousands of sequences, I know I need to write some sort of script to do this. But first I guess I need to some idea of where to get gene annotations from, how to associate that with each sequence, what kind of format should I use to display annotation and sequence (tab-delimited?). I've also read a couple of websites and a paper and they said I need everything in xgmml format to input into Cytoscape. However, I have found very little documentation on how to generate this xgmml format. So I was wondering if anyone can give me some general directions (databases to download sequences, how to organize annotations, etc), thank you!
These are two of the references I've looked at so far: http://enzymefunction.org/resources/tutorials/efi-and-cytoscape3 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0004345