I was wondering if the combination of HTSeq / DESeq2 (normally used for differential expression analysis in RNA-seq) could be used for copy number variation in case of DNA sequencing? I can't see why not, but again I didn't look at the math behind DESeq package.
Make sure to chose a more descriptive title for future threads, in this case "Copy number variation using HTSeq/DESeq2" would have been more informative. I've adapted this thread now, but please keep this in mind.
If you don't see copy number variation analysis mentioned in the DESeq / DESeq2 manual, then don't use it for that purpose. The data distribution of your CNV data will not match that expected by DESeq (expects a negative binomial distribution). CNV data is measured as discrete intervals, so, something like a Hidden Markov Model (HMM) is more commonly employed (although it can be measured on a continuous scale too).
Also, take a look at this other Biostars question: Copy Number Variation from paired end RNA-Seq data
Note, in particular Devon's reply, where he alludes to the "fundamental limitation" of trying to detect CNV from RNA-seq. This limitation relates to the fact that a copy number event does not necessarily alter gene expression levels. A gene could easily be duplicated, for example, but, without the promoter sequence and/or transcription start site (TSS), it will not be expressed (or just expressed at negligible levels).
If you can't afford to whole genome sequence, then the Affymetrix SNP 6.0 array can determine genome-wide CNV profiles, along with genotyping SNPs. I used this in my PhD years ago.
Just to add, CNV calling with a DE tool having the assumption that data is normally distributed does not in any way accord for finding CNV which works on discrete data. One needs to find the right tool and the right distribution for finding CNVs and there are plenty of technology to produce the data and tools to generate copy profiles from those data. One important this is properly accounting for allelic frequencies while scanning through the genome and then using segmentation for finding copy ratios. This cannot be done with DESeq2. Try to read about which technology is specific to which kind of data generation first to better understand the power and utility.
I totally agree with all of you. I was simply curious but Kevin brought a good argument about different expected distributions in DE vs CNV. Thank you all.
EdgeR and DESeq2 can be used for ChIPSeq mostly for differential peak calling which is different from CNV. Again data is counts and distribution is in accordance with RNAseq.
Hi ThePresident,
Make sure to chose a more descriptive title for future threads, in this case "Copy number variation using HTSeq/DESeq2" would have been more informative. I've adapted this thread now, but please keep this in mind.
Cheers,
Wouter
I will, thank you for the advice.