Entering edit mode
9.3 years ago
JacobS
▴
990
I realize copy number variation from RNA-Seq data is a poor idea since expression differences between samples will confound copy number data, but what about general inferences of ploidy?
I have large RNA-Seq sets for >50 samples, and want to determine which are aneuploidy and which are not. Does anyone know of a means to do this?
Not sure, but if you have a bam file and samtools you can find the depth of coverage using the
It will print out a per-base pair depth of coverage, which can be normalized to the average coverage of the sample. So if the reads in the region have twice the coverage they are duplicated ( a normalized value of 1.5, as 0.5 is a heterozygous deletion)
You can perform the same task with
and normalize by (read count / region size) * average read length / average coverage
Doing it systematically for RNA-seq I'm not sure, but you can probe around your bam files and see if there is an amplification.
RNASeq has read coverage variation by gene usage, so every gene has a lot of difference between samples. Your coverage method won't work for ploidy. You must be thinking of genome sequencing.
I think we'd have to use SNP frequency and look for non-binary SNP sites.