Question

CNVkit output problem: Is "log2" the same as "Seg_mean"? OR how can I get "Seg_mean" with "log2"?

0

Entering edit mode

6.3 years ago

Laven9 • 0

I have just get my CNV files by CNVkit. I am wondering if the column "log2" in the output of CNVkit (after call) is the same as "Seg_mean". If not, how can I get the "Seg_mean" with "log2"? Please, give me some advice,thanks!

CNV CNVkit Seg_mean • 3.7k views

ADD COMMENT • link updated 6.1 years ago by Eric T. ★ 2.8k • written 6.3 years ago by Laven9 • 0

0

Entering edit mode

Please read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202 and add more relevant details to your question. What have you tried? Have you read the CNVkit paper?

ADD REPLY • link 6.3 years ago by Ram 45k

0

Entering edit mode

Here are two lines of what I get.

chromosome  start   end log2    probes
chr1    826717  2410579 -0.00659771 487
chr1    2410780 2787772 -0.372291   70

Yes, I have read the CNVkit paper, here is the link.
I get an answer like this: Segment_Mean is the arithmetic mean of those probes' log2 copy ratio values.
But I am still confused how can I get "Segment_Mean"? I need it as an input to ABSOLUTE.

ADD REPLY • link updated 6.1 years ago by Ram 45k • written 6.3 years ago by Laven9 • 0

0

Entering edit mode

And I have got CNV file by Varscan too ,but the "Segment_mean" is quite too large.

ADD REPLY • link 6.3 years ago by Laven9 • 0

0

Entering edit mode

I've moved this to a comment - please do not add an answer unless you're answering the top-level question. Plus, edit your question and add this information in there. Please read posts under /t/how-to for more information.

ADD REPLY • link 6.3 years ago by Ram 45k

score 0 · Answer 1 · 2019-03-27

0

Entering edit mode

6.1 years ago

Eric T. ★ 2.8k

In the .cns files, yes, log2 is the segment mean in log2 scale. Details here: https://cnvkit.readthedocs.io/en/stable/fileformats.html

ADD COMMENT • link 6.1 years ago by Eric T. ★ 2.8k

0

Entering edit mode

Thanks for your help!

And I am now facing other problem using CNVkit, could you please give me some advice? Details are as follows: I am running CNVkit for CNV files of my whole-exon sequencing data. I use command like cnvkit.py batch -m amplicon -t targets.bed *.bam , but I can not provide the targets.bed file. And I also check Astra-Zeneca’s reference data repository but cannot find as well.

My questions are: 1) Is that right I use -m amplicon ? 2) Is there any file containing total exons of human I can use for script guess_baits.py ? I am really confused where I can get the total bed file I can use for guess!

I will appreciate it if you could give me some advice!

ADD REPLY • link 6.1 years ago by Laven9 • 0

0

Entering edit mode

For exome, -m hybrid is better than -m amplicon. You can verify that there are off-target reads by loading the BAM file in a viewer like IGV.

For guess_baits.py, try UCSC's RefSeq exons (refFlat.txt here), or another BED file of known genes from UCSC Genome Browser. Make sure the reference genome matches.

ADD REPLY • link 6.1 years ago by Eric T. ★ 2.8k

0

Entering edit mode

Thanks a lot! I got it, but I do also want to make sure I am doing the right thing. Here what I did.

skg_convert.py refFlat.txt -t bed -o refFlat.bed
guess_baits.py bam1 bam2 -t refFlat.bed -o guess_baits.bed

But I get error like this:

Loaded 80816 candidate regions from refFlat.bed
Evaluating targets in bam1
Processing reads in bam1
Time: 1281.040 seconds (205575 reads/sec, 61 bins/sec)
Summary: #bins=78477, #reads=263349347, mean=3355.7520, min=0.0, max=197074.45
Percent reads in regions: 279.509 (of 94218509 mapped)
Traceback (most recent call last):
  File "miniconda2/bin/guess_baits.py", line 246, in <module>
    baits = filter_targets(args.targets, args.sample_bams, args.processes)
  File "miniconda2/bin/guess_baits.py", line 54, in filter_targets
    "%d != %d" % (len(sample), len(baits))
AssertionError: 78477 != 80816

What does it mean?

ADD REPLY • link 6.1 years ago by Laven9 • 0

0

Entering edit mode

Hmm, not sure, I'll take a look to see if there's a bug in guess_baits.py.

If you're building a pooled reference (multiple control samples), you can also just use the refflat.bed file as-is and CNVkit will drop most of the uncaptured exons automatically.

ADD REPLY • link 6.1 years ago by Eric T. ★ 2.8k