I have 20 fasta files and I want to perform t-SNE analysis on these files. Is it possible?
This is the workflow that I want to follow in R:
I have 20 fasta files and I want to perform t-SNE analysis on these files. Is it possible?
This is the workflow that I want to follow in R:
You may already know this, but just for the random reader: t-SNE does not work with FASTA files, but rather with number matrices. If you create a symmetric distance matrix for all your sequences, those data points can be embedded into a low-dimensional space. This is to say that your general approach will work as far as producing some kind of a result, but I don't have a good feel how biologically relevant that embedding would be on a large scale.
On a small scale, it appears to work reasonably well. Below is an embedding of 3 groups of proteins (60-70 members each) that are related within a group, but not between each other.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you explain the experiment in more detail? It's unclear what you want to do.
I want to perform t-SNE analysis on multiple fasta files
"Doing a t-SNE" is not a biological question. A relevant type of question would be "I want to measure parameter X on this data" or "test hypothesis A". For example if you want to know the evolutionary relatedness of the genes you sequenced, you would usually build a phylogenetic tree from an alignment, while a t-SNE would likely be irrelevant.