Linux - bcftools to R for fst by distance coding issue
0
0
Entering edit mode
17 months ago
peavy • 0

I'm having an issue with code or perhaps the file. The code is below and the error that I got was

"Error in file(file, "r") : cannot open the connection
In addition: Warning messages:
1: In file2other(input, type, match.arg(type.out), match.arg(allele.sep)) :
  Converter vcf to pcadapt is deprecated. Please use PLINK for conversion to bed (and QC).
2: In file(file, "r") :
  cannot open file 'C:/Users/MP/Documents/R': Permission denied"

I've checked permissions but there doesn't seem to be something stopping access. I opened the file in excel and it appears the bcf cut that I did didn't cut out some of the data. 33,000 rows were supposed to be what was left however there's still 1000000 rows of just contig numbers and lengths still there. Curious if this may interfere with the code in R as I cant get past the part of the code for turning it into a bed file.

# H Outlier Identification
library(vcfR)

#upload VCF
setwd("C:/Users/Mv/Documents/R")

# Read in SNP data and set variables
EAvcf <- read.vcfR("Fall22D.vcf")  # read in vcf file
EAvcf

#Check if SNPs are biallelic
is.biallelic(EAvcf) #Only checks first 1k / 45k

######## PCADAPT OH BOY #####################
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("LEA")
library(LEA) #this helps with some conversions
library(pcadapt)

#insert new file path if changed
file.path <- "C:/Users/MP/Documents/R"
MP_bed <- read.pcadapt(file.path, type = "vcf") #vcf to bed file

x <- pcadapt(MP_bed,K=20) #Principal components = 20 should be fine standard at first

#Need to graph out the principal components to see how many there are
#count how many points above the initial plateau, there's your K number
#So if the inflection occurs at x=5, then K=4
#Some will have an even sharper plateau
G_scree<-plot(x,option="screeplot") #Seems like only 3 principal components if that

#Computing test statistic based on PCA
#Will be using Mahalanobis distance

x <- pcadapt(GP_bed,K=3) #we determined K=3
summary(x) #returns info
#9,106 is the same SNP amount as vcf file, looks good

#Lets visualize
G_Man<-plot(x,option="manhattan") #Manhattan plot

G_QQ<-plot(x,option="qqplot",threshold=0.1) #Q-Q Plot, p-value distribution
#Well its safe to say we definitely have outliers (black line expected w/o outliers)

G_Hist<-hist(x$pvalues,xlab="p-values",main=NULL,breaks=50)
#Excess small p-values confirm outliers, most follow uniform (what we want?)

## Detecting Outliers
#Need the qvalue package to convert p values to q values
#BiocManager::install("qvalue")
library(qvalue)
#Now lets isolate outliers
qval <- qvalue(x$pvalues)$qvalues #$ selects variable or column
alpha <- 0.1 #set the alpha
outliers <- which(qval<alpha) #isolate outliers, these are your outliers!
outliers
#This is your outliers file!
BCFtools linux R • 455 views
ADD COMMENT

Login before adding your answer.

Traffic: 1535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6