Question

Cnvnator Questions Help Please !

1

Entering edit mode

12.6 years ago

madkitty ▴ 690

I did a test with only one bam file and chrX on step A) Extract read mapping from BAM/SAM Files When I type -genome mm9 it outputs a file 17Mb root file When I don't specify any genome name it outputs a 30Mb root file Which one is the good one ??
We have 4 bam files per sample so for every step A,B,C,D,E Can I type in the command line the path for the 4 bam files in one shot ? Like that : [whatever command step] /mybamfile/file1.bam /mybamfile/file2.bam /mybamfile/file3.bam /mybamfile/file4.bam
GENERATING HISTOGRAM In the README file it says "Files with chromosome sequences are required and should reside in running directory or directory specified by -d option. Files should be named as: chr1.fa, chr2.fa, etc."

3.1. Are those files from the reference genome mm9 ? (Because that's all we have..)

3.2 file.root is that my previous out.root done in Step A?

3.3 If I do generate histogram, what does that tell me ? Am I suppose to get any number to use later out of it?

3.4. After -his we have to write the bin_size, I have no clue where am I suppose to find that number and what does it represent ?

4.Step C) CALCULATING STATISTICS file.root is file the same name in step A) named out.root ? so we re-use the same out.file all the time ?. I tried randomly with out.root and it says 20 times

Zero value of GC average.
Bin 1083251 with center 1.08325e+08 is not corrected.   (says that about 20 times)

Then it says that :
Making statistics for chrX after GC correction ...
Warning in <Fit>: Fit data is empty
Warning in <Fit>: Fit data is empty
Average RD per bin (1-22) is 0 +- 0 (after GC correction)
Average RD per bin (X,Y)  is 3.42284 +- 3.19117 (after GC correction)

What's are all those numbers ??

cnv • 6.2k views

ADD COMMENT • link updated 12.6 years ago by Leonor Palmeira 3.9k • written 12.6 years ago by madkitty ▴ 690

score 1 · Answer 1 · 2012-05-07

Well, those are a lot of questions in one go! Let's see if I we can help you there:

1- Have you checked their content? This usually helps... As explained in the README:

Chromosome names and lengths are parsed from sam/bam file header. Using -genome option one can overwrite this default.

So it depends on how you want this information to be parsed.

2- No, you will probably have to make some kind of loop. In bash, for instance:

for i in 1 2 3 4 ; do
[whatever command step] "/mybamfile/file"$i".bam &"
done

3.1- This is quite explicit: the files should be from the reference genome you are working on, so if you are working on mm9, the answer is yes.

3.2- Sounds logical, doesn't it?

3.3- Have you read the initial Nature paper and the CNVnator paper? These should give you the answer you are looking for. Anyway, this step seems optional.

3.4- This is the size of the bins you want for your histogram. You should have an idea of what this number should be. Namely, by reading the previous two papers.

4- For now, you have run step A only on chrX, so no, you should not be using this .root for all other steps, but only the steps concerning chrX. You should therefore run this on each chromosome (generating as many .root files as you have chromosomes). There also seems to be a problem with your .root file, so you should check you step A carefully.