hi,
I have tried using upset plot for three vcf files from different pipelines. I extracted the variant column (SNPs) and used these csv files (with one column) for R import. I have used this code:
set1 <- read.csv("set1.vcf", sep="")
set2 <- read.csv("set2.vcf", sep="")
set3 <- read.csv("set3.vcf", sep="")
set1 <- as.vector(set1$V1)
set2 <- as.vector(set2$v1)
set3 <- as.vector(set3$V1)
read_sets = list(set1_reads = set1,
set2_reads = set2,
set3_reads = set3)
upset(fromList(read_sets),
sets = c("set1_reads", "set2_reads", "set3_reads"),
number.angles = 20, point.size = 2.5, line.size = 1.5,
mainbar.y.label = "read intersection", sets.x.label = "read set size",
text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
group.by = "freq", keep.order = TRUE)
It gives an intersection plot but when the number of SNPs from upset plot are really low when I compared these with vcf-compare results using same vcf files. I am not sure why I am getting different numbers with upset plot.
If you don't mind shifting to python, I have some code which does what you are looking for, more or less, in this repository: https://github.com/wdecoster/surpyvor/
Let me know if you can't figure it out - sooner or later I'll implement it for SNVs as well.
I have never used python.I will give it a try and will let you know how it works.
Can you post example lines from your VCF files? Especially lines that are found to overlap via
upsetR
and those that are not so we can see the difference. My instinct would be slightly different formatting if you are seeing some overlapping, but looking at some example lines would help.I am not sure how to extract the overlapping SNPs/vcf lines.
You can use
grep
to do that. By the way, why are you reading VCF files usingread.csv
? Even if the defaults set forread.csv
(as opposed toread.table
which is slightly better here) are being overridden because you set thesep
, you set thesep
to""
, which means the result data frame has one column per character.If I use read.table to import, I get this error:
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Are your vcf file standard vcf format files ? Would be nice if you can share example "set1.vcf".
I extracted
Chrom REF ALT
columns by removing the headers for R import like this (some lines from set1.vcf):It is hard to guess without actual data, could you post outputs of:
versus in R: