CNVkit output problem: Is "log2" the same as "Seg_mean"? OR how can I get "Seg_mean" with "log2"?
2
0
Entering edit mode
5.9 years ago
Laven9 • 0

I have just get my CNV files by CNVkit. I am wondering if the column "log2" in the output of CNVkit (after call) is the same as "Seg_mean". If not, how can I get the "Seg_mean" with "log2"? Please, give me some advice,thanks!

CNV CNVkit Seg_mean • 3.4k views
ADD COMMENT
0
Entering edit mode

Please read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202 and add more relevant details to your question. What have you tried? Have you read the CNVkit paper?

ADD REPLY
0
Entering edit mode

Here are two lines of what I get.

chromosome  start   end log2    probes
chr1    826717  2410579 -0.00659771 487
chr1    2410780 2787772 -0.372291   70

Yes, I have read the CNVkit paper, here is the link.
I get an answer like this: Segment_Mean is the arithmetic mean of those probes' log2 copy ratio values.
But I am still confused how can I get "Segment_Mean"? I need it as an input to ABSOLUTE.

ADD REPLY
0
Entering edit mode

And I have got CNV file by Varscan too ,but the "Segment_mean" is quite too large.

ADD REPLY
0
Entering edit mode

I've moved this to a comment - please do not add an answer unless you're answering the top-level question. Plus, edit your question and add this information in there. Please read posts under /t/how-to for more information.

ADD REPLY
0
Entering edit mode
5.7 years ago
Eric T. ★ 2.8k

In the .cns files, yes, log2 is the segment mean in log2 scale. Details here: https://cnvkit.readthedocs.io/en/stable/fileformats.html

ADD COMMENT
0
Entering edit mode

Thanks for your help!

And I am now facing other problem using CNVkit, could you please give me some advice? Details are as follows: I am running CNVkit for CNV files of my whole-exon sequencing data. I use command like cnvkit.py batch -m amplicon -t targets.bed *.bam , but I can not provide the targets.bed file. And I also check Astra-Zeneca’s reference data repository but cannot find as well.

My questions are: 1) Is that right I use -m amplicon ? 2) Is there any file containing total exons of human I can use for script guess_baits.py ? I am really confused where I can get the total bed file I can use for guess!

I will appreciate it if you could give me some advice!

ADD REPLY
0
Entering edit mode

For exome, -m hybrid is better than -m amplicon. You can verify that there are off-target reads by loading the BAM file in a viewer like IGV.

For guess_baits.py, try UCSC's RefSeq exons (refFlat.txt here), or another BED file of known genes from UCSC Genome Browser. Make sure the reference genome matches.

ADD REPLY
0
Entering edit mode

Thanks a lot! I got it, but I do also want to make sure I am doing the right thing. Here what I did.

skg_convert.py refFlat.txt -t bed -o refFlat.bed
guess_baits.py bam1 bam2 -t refFlat.bed -o guess_baits.bed

But I get error like this:

Loaded 80816 candidate regions from refFlat.bed
Evaluating targets in bam1
Processing reads in bam1
Time: 1281.040 seconds (205575 reads/sec, 61 bins/sec)
Summary: #bins=78477, #reads=263349347, mean=3355.7520, min=0.0, max=197074.45
Percent reads in regions: 279.509 (of 94218509 mapped)
Traceback (most recent call last):
  File "miniconda2/bin/guess_baits.py", line 246, in <module>
    baits = filter_targets(args.targets, args.sample_bams, args.processes)
  File "miniconda2/bin/guess_baits.py", line 54, in filter_targets
    "%d != %d" % (len(sample), len(baits))
AssertionError: 78477 != 80816

What does it mean?

ADD REPLY
0
Entering edit mode

Hmm, not sure, I'll take a look to see if there's a bug in guess_baits.py.

If you're building a pooled reference (multiple control samples), you can also just use the refflat.bed file as-is and CNVkit will drop most of the uncaptured exons automatically.

ADD REPLY

Login before adding your answer.

Traffic: 2511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6