PCA Visualization error in R - arguments imply differing number of rows
0
0
Entering edit mode
6 months ago
Ali • 0

Hello Everyone, I am working on GBS SNP data in VCF using Linux. I am super new to this kind of analysis. For PCA, i am running the command like;

--bfile .bed file --out outfile --pca 10 

After running this, I get two output files: "eigenvec" and "eigenval".

I will then visualise this information in R using ggplot2;

# Set the working directory
setwd("E:/gbs_analysis/pca")

# Read eigenvec and eigenval files
eigenvec <- read.table("eigenvec", header = FALSE)
eigenval <- read.table("eigenval", header = FALSE)

# Assign column names to eigenvec
colnames(eigenvec) <- c("SampleID", paste0("PC", 1:(ncol(eigenvec)-1)))

# Check the number of rows
nrow(eigenvec) 

# Merge eigenvec and eigenval
pca_data <- merge(eigenvec, eigenval, by = "SampleID")

When I am merging, this shows error;

pcaata <- cbind(eigenvec, eigenval)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 164, 10

OR....!!

> # Merge eigenvec and eigenval
> pca_data <- merge(eigenvec, eigenval, by = "SampleID")
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
> 

I wonder how I can visualize my PCA in R.

Thanks

SNPs PCA GBS LINUX r • 297 views
ADD COMMENT
0
Entering edit mode

SNPRelate is an R package that is able to read from VCF files directly and perform PCA and IBD/IBS. See the documentation for details in Bioconductor.

ADD REPLY

Login before adding your answer.

Traffic: 1598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6