Dear Friends,
I am trying to plot "number of variants in each cancer type in vcf files". Could you please let me know how to do this using R/python or bash? I have a text file of the samples and the cancer type associated with it, like this below:
Samples Cancer Type
TCGA-XXX.barcode ACC
.
.
I am new to this and learning. Thank much!
DK
It is unclear which data you have - be specific, e.g. number of samples - and which type of plot you aim to obtain. Please elaborate and show an example.
Thanks! It is tcga cancer data vcf files. These are merged vcf files of about 10000 samples for each chromosome. I do not know the name of the plot but am looking for is a plot of "number of variants in each cancer type in the vcf files". Please let me know if am clear and what you think could be done to obtain this? Thanks much!
Can you perhaps draw the plot you have in mind on a piece of paper, take a picture and show us? How should the 10k samples be summarized?
Thanks for your reply! This is the type of plot I am looking to generate from VCF files: https://www.dropbox.com/s/rfvyw8b8v62lhuz/example.jpg?dl=0
Please let me know how to generate this plot from vcf files. Thanks
See also How to add images to a Biostars post
How do you link the sample identifiers in the vcf to the cancer types?
Please update your initial question when adding information. We are losing valuable time here because I have to ask for clarification every time. I assume people who can help you don't want to go through all these comments.
Thanks! I will remember that next time.
To link the identifiers to the cancer type I have a text file like this:
Now, am trying to figure out how to use this information to plot "number of variants for each cancer type" using this above file and the merged vcf file for 10000 samples. Please let me know if am missing any info here, I will provide them. Thanks.
Also that file is important information which should have been part of your first post. We have now wasted 11 hours until we found all required information.
Please update your initial question when adding information.
I can solve this in Python, but not in R.
Thanks, I have updated the question. Could you please share with me how we can plot such a plot with these available info? I would really appreciate. Thanks.