Entering edit mode
6 months ago
Ali
•
0
Hello Everyone, I am working on GBS SNP data in VCF using Linux. I am super new to this kind of analysis. For PCA, i am running the command like;
--bfile .bed file --out outfile --pca 10
After running this, I get two output files: "eigenvec" and "eigenval".
I will then visualise this information in R using ggplot2;
# Set the working directory
setwd("E:/gbs_analysis/pca")
# Read eigenvec and eigenval files
eigenvec <- read.table("eigenvec", header = FALSE)
eigenval <- read.table("eigenval", header = FALSE)
# Assign column names to eigenvec
colnames(eigenvec) <- c("SampleID", paste0("PC", 1:(ncol(eigenvec)-1)))
# Check the number of rows
nrow(eigenvec)
# Merge eigenvec and eigenval
pca_data <- merge(eigenvec, eigenval, by = "SampleID")
When I am merging, this shows error;
pcaata <- cbind(eigenvec, eigenval)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 164, 10
OR....!!
> # Merge eigenvec and eigenval
> pca_data <- merge(eigenvec, eigenval, by = "SampleID")
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
>
I wonder how I can visualize my PCA in R.
Thanks
SNPRelate is an R package that is able to read from VCF files directly and perform PCA and IBD/IBS. See the documentation for details in Bioconductor.