I have created a Protein Database using Cytoscape by following the procedure reported in the paper "Revealing Unexplored Sequence-Function Space Using Sequence Similarity Networks" (doi: 10.1021/acs.biochem.8b00473). After the database finalisation and the "all-vs-all" BLAST, I can load up and visualise the clustered database. I am interested in a specific cluster node that contains several protein sequences which is possible to look at their shared name under the REPRESENTED ACCs column of the database cluster. I wish to retrieve from that particular node, all the sequences that are present together with their aa sequence. The purpose would be to create a sub-network of protein sequences that are similar to each other within the selected threshold so to study their commonest features. Is this task possible? Can you recommend an alternative way to do so? Cheers
Hi scooter, thanks for your reply. To answer your first question, the initial set from the UniProtKB database constitutes 14150 protein sequences, therefore its reduction into metanodes (seed sequences as they call it in the supplementary info of that paper). And that is the actual point. By following the instructions reported in the supp. info till the bottom, is reported to add, as you wrote in your reply, to upload together with your sequence network a custom table which contains the mapping of each node name to the corresponding CD-HIT cluster. When I upload the table to my network, the protein sequence is shown but only just for the representative sequence of the cluster. If I click on the table when a cluster is selected, I can read the protein sequence of the representative one and the name of all the other sequences present in the highlighted cluster (see image attached![1]). I want to retrieve the sequences of all the proteins contained and create a separate network with those sequences. Is that possible?