Copy number profile using VarScan
2
0
Entering edit mode
5.0 years ago
tanbiswas6 ▴ 10

Hi I'm using VarScan to call CNVs. I've generated raw, called, adjusted GC content and the segmented file of Copy Number. But I don't know the code that how to plot the CNVs using R? Can anybody share the code? And which file that I've generated using VarScan should I use?

CNVs VarScan R CNV_Graph • 3.5k views
ADD COMMENT
0
Entering edit mode
5.0 years ago

CNVkit has some nice visualizations that are pretty easy to use. You may have to do some finagling to get your files to match their format, but it shouldn't be too difficult.

If you're dead set on an R option, pureCN and DNAcopy both have some plotting functions, though again, you may have to tweak your files slightly to match their formats.

Without showing us your files or what plots you want to actually make, it's difficult to give a more specific answer.

ADD COMMENT
0
Entering edit mode
5.0 years ago
tanbiswas6 ▴ 10

Thank you for your consideration.The followings are the files that I've generated using VarScan: RAW CNV file(file format is .copynumber): chrom chr_start chr_stop num_positions normal_depth tumor_depth log2_ratio gc_content chr1 12932 13031 100 19.9 0.0 -2.000 60.0 chr1 13032 13131 100 27.5 0.1 -7.839 61.0 Called file( extension is .copynumber.called): chrom chr_start chr_stop num_positions normal_depth tumor_depth adjusted_log_ratio gc_content region_call raw_ratio chr1 13329 13428 100 155.3 39.1 -2.176 53.0 del -1.99 chr1 13429 13528 100 422.2 89.1 -2.462 59.0 del -2.245 Adjusted GC content file(extension- .copynumber.called.gc): gc regions avg_log2 mean_sd_log2 0 8 -0.35275 -0.2595448 3 4 0.2735 0.3667052 4 3 -0.33766666 -0.24446145 Segmented file-generated from RAW CNV file using R(DNAcopy library): "chrom" "loc.start" "loc.end" "num.mark" "seg.mean" "1" "chr1" 12932 664293 78 17.2449 "2" "chr1" 664393 664693 4 99.35

Now please suggest that how can I generate plots like this: Image1 Image 2

ADD COMMENT
0
Entering edit mode

Please try to post comments as comments rather than answers, as it makes conversation easier to follow. You can also always add more information to your original question by editing it. Additionally, please use the code formatting button (the one with 1's and 0's) to format code so that it's readable.

The last figure is analogous to CNVkit's scatter command. The other two will require a bit of thought and manual manipulation on your part. The first will require you to average the log2 ratio at each bin for all your samples, then it's basically just a bar plot split by chromosomes. The second will require you to have gain/loss at each bin for each sample and simply calculate the percentage of each for each bin. Both of those plots, maybe minus the annotation bar at the bottom, could be easily done in ggplot2. The karyotype addition at the bottom is something you'd have to figure out on your own.

ADD REPLY
0
Entering edit mode

The first plot better be done via GISTIC 2.0 (using results from VarScan). The 2nd plot does not look informative to me. The third plot may be done via plot function in R sample by sample, however, you'll need to prepare a chromosome structure yourself. As for me, I like visualizations from FACETS since it gives MUCH more insights into the tumor structure. I like visualizations from my tool more, but if you want something quick - use FACETS.

ADD REPLY
0
Entering edit mode

The first plot can't be replicated from GISTIC2, though you're correct that it creates something that's analogous to it. Definitely worth a shot. Hadn't heard of FACETS before, documentation looks kind of sparse, but will have to give it a shot some time.

ADD REPLY
0
Entering edit mode

Honestly, I can not say that I fully understand if the first plot is a multi-sample one or it is a single sample. If it is multi-sample - GISTIC IMHO gives a more meaningful plot of significant pikes. If it is one-sample - then why, FACETS-like plots would be better. Personally, I prefer this plots (they are easier for me to understand than even FACETS one) this visualization, but have to admit - it is easier and faster to run FACETS. And it works quite good.

ADD REPLY
0
Entering edit mode

The first one differs slightly from GISTIC in that GISTIC really is focused on minimal common regions (MCRs), which are the spikes in the GISTIC plot. It's really meant for identifying recurrent focal CNAs and their potential driver genes. The first plot here summarizes not only focal changes, but broad changes as well by averaging the log2 ratios across multiple samples.

If just highlighting recurrently amplified/deleted genes in their genomic context is what you're after, either works, but the plot here also serves to show a general idea of the CNA changes across all samples.

Mind explaining your top and bottom plots here? It's not totally clear what they're showing.

ADD REPLY
1
Entering edit mode

Oh, thanks. I am not good in reading manuals when there are deadlines and I was thinking where my recurrent aneuploidies disappeared in my cancer samples in GISTIC - now I know. For me it is still difficult to infer something from the first plot (it is a sum of all samples, but there are sub-groups almost always), but I get the logic.

Yeap, the top plot shows B-allele frequency of normally heterozygous SNVs - everything that deviates from 0.5 (the y-axis is from 0 to 1) indicates a presence of CNA. The bottom plot shows the allelic decomposition. Two bold horizontal lines for each segment - integer copy-numbers of each segment (for both alleles). When one bold line drops to 0 - it is a deletion of the minor allele. The color indicates a Cancer Cell Fraction (lighter color = smaller sub-clone). In general, I've tried to do my best to explain it here - https://github.com/imgag/ClinCNV/blob/master/doc/somatic_CNA_analysis.md#cnas-visualization-plots - and looking for a criticism, if it is clearly explained.

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6