I am doing PCA in R on a data frame(df_f
),this is pasted below. Rows are samples. Columns are genes.
pc_gtex <- prcomp(df_f)
as.fumeric <- function(x,levels=unique(x)) as.numeric(factor(x,levels=levels))
cols=as.fumeric(gtex_pm$tissue
plot(pc_gtex$x[,1], pc_gtex$x[,2], col=cols, main = "PCA", xlab = "PC1", ylab = "PC2")
legend("topleft", col=1:17, legend = paste(unique(gtex_pm$tissue), 1:17), pch = 20, bty='n', cex=1.5)
head(gtex_pm)
sample tissue
1 SRR1069514 Prostate
2 SRR1071717 Bladder
3 SRR1073069 Prostate
4 SRR1074410 Prostate
Based on the above gtex_group
object looks like the levels:
head(gtex_group)
[1] 1 2 1 1 1
The sample head of Main table for PCA is: The rownames are the Samples
SRR1069514 0 0.0009995 5.773065971 1.644998088 0.142367241 0.176471143 0.195566784 0.0009995 0.025667747 3.380994674 1.762502288 0 0.077886539 0 0.002995509 0.01093994 2.110576771 1.38829236 2.26186726 0.431132855 3.108480433 3.96347629 0 0 0.41012092 3.48452699 1.68565794 0 1.425034189 1.87456758 2.590542128 0 0 0 1.941471742 0.961646434 0 1.17711535 0.058268908 0 0.260824618 3.08534443 1.10426296 0.242946179 0.0009995 0 0 0 0.0009995 1.560247668 1.517541898 0.016857117 0.767326579 0.0009995 3.0191069 0 2.607050533 1.446683661 2.288384744 2.62082062 0.19309663 0 0 0.234281296 0 1.415610416 2.328837464 0.008959741 0.911479175 0.375005901 0.660107327 3.184739763 1.16064768 0.001998003 0.138891999 2.219855445 3.1011278 1.81872592 2.98229236 2.4114395 3.24528404 0 1.54734972 0.406131553 0.029558802 0.003992021 0.693647056 2.07581 2.8357982 0.0009995 0.082501222 1.09661029 2.75829962 0.635518068 3.11484775 0.01291623 3.40837159 0
SRR1071717 0 0 0.0009995 4.99519673 1.626491667 0.100749903 0.327863862 0.09531018 0 0.056380333 3.328196489 1.541373182 0 0.091667189 0.044973366 0 0.033434776 1.953311265 1.56444055 1.79142608 0.993622075 3.206236281 3.82609468 0 0 2.565487674 3.2202349 1.1304339 0 1.092258815 1.80203978 2.645394351 0 0 0.0009995 1.681200279 2.047434746 0 0.948176921 0.006975614 0.014888613 0.298622013 2.49667052 1.01884732 0.38662202 0 0 0 0 0.0009995 0.941958479 1.752845376 0.017839918 0.216722984 0.051643233 3.0505518 0 2.034444176 0.988053098 2.235804059 1.89686995 0.090754363 0 0 0.198850859 0 1.585554972 2.274905524 0 0.04305949 0.056380333 0.044016885 0.771496147 1.195436473 0 0.368801124 1.974636427 2.7700856 2.00120969 2.88875935 2.2651947 2.66242502 0 0.429181635 0.04018179 0.034401427 0 0.242161557 1.9907469 2.1384177 0.0009995 0.008959741 0.99916021 2.3892214 0.086177696 3.16821391 0 3.2038434 0
SRR1073069 2.19544522 1.32866525 0.0009995 4.50198508 1.159707388 0.141499562 0.265436464 0.026641931 2.3330173 0.028587457 3.140698044 1.537297235 0.012916225 0.023716527 0 0.002995509 0.049742092 2.071157322 1.02460688 2.11818137 0.359072069 2.419656765 3.5065479 0.137149838 2.121902193 0.305276381 2.95958683 1.49939981 3.14397985 1.001366904 1.450911 1.39475844 1.930071085 1.140074079 0.037295785 1.609437912 0.412109651 0.870456196 0.943516718 0.013902905 0 0.152721087 2.88836976 1.482967248 0.272314595 2.061532121 0.552159487 2.394890764 1.391033116 0.443402947 1.593714952 1.285921387 0.00796817 0.371563556 0.020782539 3.1946651 1.26327891 2.212003715 1.46672161 2.140183804 2.71997877 0.294161039 0.018821754 0.0009995 0.179818427 1.893714192 1.731478538 2.502255288 0.013902905 0.752830183 0.347129531 0.407463111 2.467082065 0.558472277 1.563812734 0.022739487 1.608837732 2.8176816 1.30670988 2.44495233 1.81107178 3.03254625 0.569283193 0.948176921 0.101653654 0.036331929 0 0.786182047 1.9867779 3.5039946 2.463427618 0.008959741 0.76360564 2.20640453 0.514618422 2.87964779 1.11021142 3.18750899 1.22436349
SRR1074410 2.69022562 1.70055751 0.013902905 3.314622273 0.503196597 0.4940863 0.044016885 0.023716527 1.753884517 0.03246719 2.767324893 1.666385193 0.009950331 0.05259245 0 0 0.017839918 1.575260461 0.76779072 2.22202559 0.83377831 2.198113071 3.57953881 0.051643233 2.207284913 0.072320662 3.04414141 1.39177929 2.851746423 0.982452934 1.33210213 1.888583654 1.871340532 1.238664044 0.03246719 1.734659877 0.486737828 0.412109651 1.126551657 0.035367144 0 0.213497174 2.76032635 1.131402111 0.572108852 2.102425378 0.291175962 1.85159947 0.943516718 0.283674051 1.232560261 0.982078472 0 0.223943232 0.035367144 2.9064091 1.583299255 2.376671636 1.185095749 2.07681309 2.20794469 0.877549904 0.151002874 0 0.107059072 3.038312721 1.486365915 2.633829402 0 0.403463105 0.195566784 0.285930539 1.296643139 0.48796633 1.664115474 0.054488185 1.884034745 2.3757426 1.71036863 2.61732284 1.9348492 3.1138708 1.220239777 0.322807874 0.12398598 0.004987542 0.002995509 0.446607051 1.939317 3.8484227 2.78346684 0.025667747 0.78253074 2.03352848 0.181487876 2.7091163 1.00430161 3.1429015 1.24875495
Once I have the plot with 17 levels,the legend created displays 17 levels,but the colors for them repeat after 1 to 8.So the 9th label has the same color as the first. Also,Is there any better way to add the group labels on the PCA plot.I have 17 unique groups.Either 2 groups are being assigned the same color because of "cols" variable or because of plotting "legend".The levels in cols variable are 17
I am just following this post from a genomics class.
Your question isn't clear, but it seems to be about how to make a scatterplot with a lot of colors? The PCA itself went fine?
Yes the PCA is fine. I get the labels too which are 17. But 2 labels have the same color and are hard to differentiate. So 8 colors are there and then they repeat themselves. I tried the following code too, but I only get 6/7 colors.
Or if the colors can not be increased, can I just add the unique group labels on the plot itself?
The group labels are the same as in the legend command
use letters instead of colors to distinguish the groups.
I don't have time to write you a full example, but you can use letters instead of colors, as in this post: http://is-r.tumblr.com/post/35050025650/plotting-letters-as-shapes-in-ggplot2
If you are trying to distinguish prostate from bladder, use two colors. Or different shades of one color for bladder, and different shades of a second color for prostate. Then you can more easily visualize spatial differences between bladder and prostate tissues, as well as relative "inter-tissue" differences for a tissue type within a PCA blob or cluster.