I now have 3 cancer bam files and 3 normal bam files.
I have thought of two ways to run the cnvkit, could anyone tell me which one can get the reasonable result?
1) merge all cancer bam files as cancer_all.bam
and merge all the normal bam files as normal_all.bam
. Use cancer_all.bam
and normal_all.bam
as the input.
2) use cancer_1.bam
, cancer_2.bam
and cancer_3.bam
as the input (do does normal_{1,2,3}.bam
)
My goal is to use batch command and get cns file. Because I need to filter out the region that have CN > 4.9 and at least one sample having amplification overlapping. I know how to filter out CN because that is a column in the cns file named "cn", but if you know how to filter out at least one sample having amplification overlapping, could you please teach me as well?
Thank you very much!!
The real background is that I am trying to make a file almost the same as the file in the data repository, and the description for that file is
- mm10_conserved_gain5.bed Panel of 40 normal mouse genomes. CNVkit cnr file CN>4.9, with >=20% (8) samples having amplification overlapping
oh my gosh, yes. Silly me!
But I didn't notice that there is a line "assume the BAM files are a collection of 'tumor' and 'normal' samples". So I think I am supposed to put multiple seperated bam files rather than one merge bam file. Thank you very much!!