Hello Everyone,
I am new to CNV and a beginner with R language. I am trying to call germline CNVs using exome data using ExomeDepth. I have tried the example given in this it was confusing for when i tried to apply/try with my data. Can someone please help me? explain/show steps which i need to follow to call CNVs. Also in the offical vignette they used Hg19 and i have data using Hg38, how do i go about it?
What data i have?
I have downloaded exome data from 1000g project, cleaned, duplicate marked and BQSR using GATK4 best practices for control. And similarly for sample bam files. In total i have 10 control and 20 sample bam files.
What am i trying to achieve? I am trying to call good quality CNVs using read depth method after read/searching i have finalied Exomedepth.
Since i am a beginner R i am find it difficult, can someone guide me with steps/commads?
Thank you so much for your time.
Hi, the biggest problem will be to switch to HG38. This is not a trivial replacement here, I think - too many annotations there are based on hg38. ExomeDepth is a good tool, but for a beginner who wants to work with HG38 I'd suggest to try some other tool.
@German.M.Demidov, Thank you for you input. The replacement here, Are you talking about the coordinates? can i use tools like UCSC liftover? or is it because of the 0 based 1 based problem? Those i can do with python/shell. Can you please elaborate on what the biggest problem you were talking about if i try to pursue this. if its such problem i will try do with python/shell than i would use R for the package.
Thank you so much for your time
I used the instruction located under the title "10 How to loop over the multiple samples" from the vignette. It starts from "data(Conrad.hg19)" (which means - load this data based on hg19) and it continues to use this data. If you are able to modify files and execute the first 2 commands from the section I've noted (number 10) - then it is almost done and is possible to use.