I have 21 .vcf files from tumour Vs normal samples. I likely annotated them individually by Ensembl Variant Effect Predictor (VEP) and snpEff. Now, I have 21 .txt files for each of these tools as results. For example this is results of one .vcf in Vep
Did you check the help for waterfall or lolliplot? I haven't used those packages, but it seems to me you will need the genomic coordinates of both the gene as well as the mutations. You will likely have to do some data wrangling to parse the location of the mutation of the file you linked into three distinct columns that R can work with (note how the file from the tutorial has separate columns for chromosome, start and stop) -- all the details should hopefully be explained in the help of both functions, and if not, please be more specific in terms of which details you don't understand.
Like Friederike says, these are called lollipop plots indeed. You can search online for various tools that plot them, such as pbnjay's mutsneedle, cBioPortal's MutationMapper, etc, but each has its own limitations.
You please imagine I have called somatic indels from cancer Vs normal samples and I have such .vcf files, what is the next step? I googled a lot but I am getting more confused. If my question is the mutation underlying this type of cancer, what these vcf files would say? I saw people use MutsigCV but I don't know why. For example Vep and snpEff do good job in feature selection then why people use MutsigCV.
VEP and snpEff annotate variants. MutSigCV, AFAIK, picks significantly mutated genes, a totally different task. Also, if you're using GDC's MAF format, they refer to Alt alleles differently, Tumor_Seq_Allele1 and Tumor_Seq_Allele2. Ref allele is still called Reference_Allele though, so that should not be the problem.
VCF files WILL have the ref allele column, there can be no VCF file that does not have REF.
EDIT: This offshoot is not related to the original post (which deals with reproducing a lollipop plot), please search the forum for discussions related to VCF annotation/MutSigCV.
Sorry, both Vep and snpEff results don't have reference allele column
I am seeing in most of your kindly suggested tools for visualization we need these columns
Even in Vep results start and end positions have been merged.
I am not sure how to deal with incompatibility in input files.
The VCF file that you used for VEP should have that information
Sorry for this silly question;
You please imagine I have called somatic indels from cancer Vs normal samples and I have such .vcf files, what is the next step? I googled a lot but I am getting more confused. If my question is the mutation underlying this type of cancer, what these vcf files would say? I saw people use MutsigCV but I don't know why. For example Vep and snpEff do good job in feature selection then why people use MutsigCV.
VEP and snpEff annotate variants. MutSigCV, AFAIK, picks significantly mutated genes, a totally different task. Also, if you're using GDC's MAF format, they refer to Alt alleles differently,
Tumor_Seq_Allele1
andTumor_Seq_Allele2
. Ref allele is still calledReference_Allele
though, so that should not be the problem.VCF files WILL have the ref allele column, there can be no VCF file that does not have REF.
EDIT: This offshoot is not related to the original post (which deals with reproducing a lollipop plot), please search the forum for discussions related to VCF annotation/MutSigCV.
Thank you, because I thought why I am trying produce a lollipop plot, eventually what is my goal that is why I got concerned
Sorry, could I use dNdScv instead of MutSigCV for finding significant variants?
I'm sorry, I don't know. MutSigCV is as far as my exposure goes.