Hello Everyone,
I am trying to use the R package KaryotypeR to produce rainfall plots in R. I am having trouble reading my VCF data into R and getting it to work in KaryotypeR. I've got their tutorial to work (link here): https://bernatgel.github.io/karyoploter_tutorial//Examples/Rainfall/Rainfall.html
I have a VCF with example rows like this (excluding the many ##header lines for readability:
##FORMAT=<ID=FAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, forward strand">
##FORMAT=<ID=FCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, forward strand">
##FORMAT=<ID=FGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, forward strand">
##FORMAT=<ID=FTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand">
##FORMAT=<ID=RAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, reverse strand">
##FORMAT=<ID=RCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, reverse strand">
##FORMAT=<ID=RGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, reverse strand">
##FORMAT=<ID=RTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, reverse strand">
##FORMAT=<ID=PM,Number=1,Type=Float,Description="Proportion of mut allele">
##SAMPLE=<ID=NORMAL,Description="Normal",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001c,Source=.>
##SAMPLE=<ID=TUMOUR,Description="Tumour",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001b_lo0019,Source=.>
##FILTER=<ID=DTH,Description="Less than 1/3 mutant alleles were >= 25 base quality">
##INFO=<ID=CLPM,Number=1,Type=Float,Description="A soft flag median number of soft clipped bases in variant supporting reads">
##INFO=<ID=ASMD,Number=1,Type=Float,Description="A soft flag median alignement score of reads showing the variant allele">
##vcfProcessLog_20180918.1=<InputVCF=<AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz>,InputVCFSource=<FlagCaVEManVCF.pl>,InputVCFVer=<1.7.3>,InputVCFParam=<sp=.,umv=.,h=.,g=.,f=AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz,t=WGS,loud=.,n=AD0001c.bam,ref=genome.fa.fai,m=AD0001b_lo0019.bam,v=flag.to.vcf.convert.ini,s=TIGER,l=2000,ab=.,p=.,c=Tiger_flag.vcf.config.ini,b=.,idx=.,o=D0001b_lo0019_vs_AD0001c_flagged.vcf>>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR
contig01000014.1 1128 165247d6-b4df-11e8-bec4-89a28c540d42 T C . PASS DP=89;MP=8.8e-01;GP=5.0e-11;TG=TT/CTTTT;TP=8.8e-01;SG=TT/TTTTT;SP=1.2e-01;CLPM=0.00;ASMD=136.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM 0|0:0:0:0:11:0:0:1:27:0.0e+00 0|1:0:0:0:19:0:5:0:26:1.0e-01
contig01000543.1 276 16524d8a-b4df-11e8-bec4-89a28c540d42 T C . PASS DP=67;MP=8.7e-01;GP=1.3e-01;TG=TT/CTTTT;TP=8.7e-01;SG=CT/CTTTT;SP=1.2e-01;CLPM=0.00;ASMD=57.50 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM 0|0:0:0:0:3:0:1:0:12:6.2e-02 0|1:0:3:1:21:0:3:0:23:1.2e-01
contig01000692.1 2433 16524ee8-b4df-11e8-bec4-89a28c540d42 A T . PASS DP=74;MP=9.8e-01;GP=2.0e-02;TG=AA/AAAAT;TP=8.8e-01;SG=AA/AAATT;SP=9.6e-02;CLPM=9.00;ASMD=101.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM 0|0:19:0:0:3:25:0:0:0:6.4e-02 0|1:11:0:0:5:11:0:0:0:1.9e-01
contig01000694.1 830 16525032-b4df-11e8-bec4-89a28c540d42 C T . PASS DP=114;MP=1.0e+00;GP=5.6e-14;TG=CC/CCCCT;TP=8.6e-01;SG=CC/CCCTT;SP=1.4e-01;CLPM=0.00;ASMD=111.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM 0|0:0:26:0:0:0:42:0:2:2.9e-02 0|1:0:8:0:1:0:26:0:9:2.3e-01
contig01000694.1 872 16525172-b4df-11e8-bec4-89a28c540d42 G T . PASS DP=89;MP=1.0e+00;GP=2.9e-07;TG=GG/GGGTT;TP=7.4e-01;SG=GG/GGGGT;SP=2.4e-01;CLPM=0.00;ASMD=115.50 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM 0|0:0:0:23:0:0:0:34:3:5.0e-02 0|1:0:0:9:0:0:0:11:9:3.1e-01
I would like to convert this into a GRanges format such that I can subset the 'TUMOUR' data (excluding the 'NORMAL') and plot the variants as a rainfall plot with the KaryotypeR package using these commands from the tutorial (where sm.gr is a GRanges object of the data):
library(karyoploteR)
kp <- plotKaryotype(plot.type=4)
kpPlotRainfall(kp, data = sm.gr)
Could someone please point me towards a good way to read in and convert a VCF to a GRanges format suitable for this task?
Thanks for your help.
What is the exact error you're seeing? Can you give an example row from your vcf? That example data you have there doesn't look like a vcf
Thanks for the reply. The data I give is an alternative format encase that is easier to work with for karyotypeR. I've amended the answer with an example vcf row and example error
Tips : When you get an error on a command line, don't go further in your analysis, you will fail
Tips2 : Always put command lines and error messages in your post
You got :
It speaks for itself, your
start
column is not numericTry to :
And same for the
end
column I guess.Could you had the command to generate the variable
data
pleaseThanks for the help but 'start' is a column of integer values in my table. I think I am inappropriately using the toGranges() function. I feel there must be a straightforward way to convert a vcf to a granges object suitable for karyotypeR and was just wondering if someone could point me to that.
Please include the result of :
in your post
substantially modified the post for readability and will do this thanks
Please include the
##header
too. A lot of tools to read vcf file use these header marks to check the vcf file. Upload your file somewhere (dropbox....) and add the link in your post is the best way to do itOK I just added some of it as there are >1000 lines for header due to a large number of contigs
You deleted the interesting stuff with the error message and your command lines. Please add them