VCF input for KaryotypeR
1
1
Entering edit mode
6.1 years ago
Rubal ▴ 350

Hello Everyone,

I am trying to use the R package KaryotypeR to produce rainfall plots in R. I am having trouble reading my VCF data into R and getting it to work in KaryotypeR. I've got their tutorial to work (link here): https://bernatgel.github.io/karyoploter_tutorial//Examples/Rainfall/Rainfall.html

I have a VCF with example rows like this (excluding the many ##header lines for readability:

##FORMAT=<ID=FAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, forward strand">
##FORMAT=<ID=FCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, forward strand">
##FORMAT=<ID=FGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, forward strand">
##FORMAT=<ID=FTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand">
##FORMAT=<ID=RAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, reverse strand">
##FORMAT=<ID=RCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, reverse strand">
##FORMAT=<ID=RGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, reverse strand">
##FORMAT=<ID=RTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, reverse strand">
##FORMAT=<ID=PM,Number=1,Type=Float,Description="Proportion of mut allele">
##SAMPLE=<ID=NORMAL,Description="Normal",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001c,Source=.>
##SAMPLE=<ID=TUMOUR,Description="Tumour",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001b_lo0019,Source=.>
##FILTER=<ID=DTH,Description="Less than 1/3 mutant alleles were >= 25 base quality">
##INFO=<ID=CLPM,Number=1,Type=Float,Description="A soft flag median number of soft clipped bases in variant supporting reads">
##INFO=<ID=ASMD,Number=1,Type=Float,Description="A soft flag median alignement score of reads showing the variant allele">
##vcfProcessLog_20180918.1=<InputVCF=<AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz>,InputVCFSource=<FlagCaVEManVCF.pl>,InputVCFVer=<1.7.3>,InputVCFParam=<sp=.,umv=.,h=.,g=.,f=AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz,t=WGS,loud=.,n=AD0001c.bam,ref=genome.fa.fai,m=AD0001b_lo0019.bam,v=flag.to.vcf.convert.ini,s=TIGER,l=2000,ab=.,p=.,c=Tiger_flag.vcf.config.ini,b=.,idx=.,o=D0001b_lo0019_vs_AD0001c_flagged.vcf>>
    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOUR
    contig01000014.1    1128    165247d6-b4df-11e8-bec4-89a28c540d42    T   C   .   PASS    DP=89;MP=8.8e-01;GP=5.0e-11;TG=TT/CTTTT;TP=8.8e-01;SG=TT/TTTTT;SP=1.2e-01;CLPM=0.00;ASMD=136.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:0:11:0:0:1:27:0.0e+00   0|1:0:0:0:19:0:5:0:26:1.0e-01
    contig01000543.1    276 16524d8a-b4df-11e8-bec4-89a28c540d42    T   C   .   PASS    DP=67;MP=8.7e-01;GP=1.3e-01;TG=TT/CTTTT;TP=8.7e-01;SG=CT/CTTTT;SP=1.2e-01;CLPM=0.00;ASMD=57.50  GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:0:3:0:1:0:12:6.2e-02    0|1:0:3:1:21:0:3:0:23:1.2e-01
    contig01000692.1    2433    16524ee8-b4df-11e8-bec4-89a28c540d42    A   T   .   PASS    DP=74;MP=9.8e-01;GP=2.0e-02;TG=AA/AAAAT;TP=8.8e-01;SG=AA/AAATT;SP=9.6e-02;CLPM=9.00;ASMD=101.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:19:0:0:3:25:0:0:0:6.4e-02   0|1:11:0:0:5:11:0:0:0:1.9e-01
    contig01000694.1    830 16525032-b4df-11e8-bec4-89a28c540d42    C   T   .   PASS    DP=114;MP=1.0e+00;GP=5.6e-14;TG=CC/CCCCT;TP=8.6e-01;SG=CC/CCCTT;SP=1.4e-01;CLPM=0.00;ASMD=111.00    GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:26:0:0:0:42:0:2:2.9e-02   0|1:0:8:0:1:0:26:0:9:2.3e-01
    contig01000694.1    872 16525172-b4df-11e8-bec4-89a28c540d42    G   T   .   PASS    DP=89;MP=1.0e+00;GP=2.9e-07;TG=GG/GGGTT;TP=7.4e-01;SG=GG/GGGGT;SP=2.4e-01;CLPM=0.00;ASMD=115.50 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:23:0:0:0:34:3:5.0e-02   0|1:0:0:9:0:0:0:11:9:3.1e-01

I would like to convert this into a GRanges format such that I can subset the 'TUMOUR' data (excluding the 'NORMAL') and plot the variants as a rainfall plot with the KaryotypeR package using these commands from the tutorial (where sm.gr is a GRanges object of the data):

library(karyoploteR)
kp <- plotKaryotype(plot.type=4)
kpPlotRainfall(kp, data = sm.gr)

Could someone please point me towards a good way to read in and convert a VCF to a GRanges format suitable for this task?

Thanks for your help.

R • 3.7k views
ADD COMMENT
0
Entering edit mode

however when I try to convert this to a GRanges object (which I believe is needed for karyotypeR, it does not work.

What is the exact error you're seeing? Can you give an example row from your vcf? That example data you have there doesn't look like a vcf

ADD REPLY
0
Entering edit mode

Thanks for the reply. The data I give is an alternative format encase that is easier to work with for karyotypeR. I've amended the answer with an example vcf row and example error

ADD REPLY
0
Entering edit mode

Tips : When you get an error on a command line, don't go further in your analysis, you will fail

Tips2 : Always put command lines and error messages in your post

You got :

Error in .normargSEW0(start, "start") : 'start' must be a numeric vector (or NULL)

It speaks for itself, your start column is not numeric

Try to :

data$start = as.numeric(as.character(data$start))

And same for the end column I guess.

Could you had the command to generate the variable data please

ADD REPLY
0
Entering edit mode

Thanks for the help but 'start' is a column of integer values in my table. I think I am inappropriately using the toGranges() function. I feel there must be a straightforward way to convert a vcf to a granges object suitable for karyotypeR and was just wondering if someone could point me to that.

ADD REPLY
0
Entering edit mode

Please include the result of :

head -30 your_file.vcf

in your post

ADD REPLY
0
Entering edit mode

substantially modified the post for readability and will do this thanks

ADD REPLY
0
Entering edit mode

Please include the ##header too. A lot of tools to read vcf file use these header marks to check the vcf file. Upload your file somewhere (dropbox....) and add the link in your post is the best way to do it

ADD REPLY
0
Entering edit mode

OK I just added some of it as there are >1000 lines for header due to a large number of contigs

ADD REPLY
0
Entering edit mode

You deleted the interesting stuff with the error message and your command lines. Please add them

ADD REPLY
3
Entering edit mode
6.1 years ago
library(karyoploteR)
tmp.vcf<-readLines("your_file.vcf")
tmp.vcf.data<-read.table("your_file.vcf")
tmp.vcf<-tmp.vcf[-(grep("#CHROM",tmp.vcf)+1):-(length(tmp.vcf))]
vcf.names<-unlist(strsplit(tmp.vcf[length(tmp.vcf)],"\t"))
names(tmp.vcf.data)<-vcf.names
colnames(tmp.vcf.data)[1] <- "Chr"
colnames(tmp.vcf.data)[2] <- "Start"
tmp.vcf.data = cbind(tmp.vcf.data,End=rep(tmp.vcf.data$Start+1))
tmp.vcf.data <- tmp.vcf.data[, c(1,2,12, 3:11)]
toGRanges(tmp.vcf.data)

Edit : Good luck to generate a custum genome based on your contigs in KaryoploteR !

ADD COMMENT
0
Entering edit mode

Thanks the code. When I try with this commandL kpPlotRainfall(kp, data = tmp.vcf.data , col=variant.colors) I get the error: Error in attr(x, "tsp") <- c(1, NROW(x), 1) : attempt to set an attribute on NULL I suspect this is related to needing to create a correct contig base plot for adding this data to, so will explore that.

ADD REPLY

Login before adding your answer.

Traffic: 2992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6