Hi All,
I would like to analyze the allele-specific expression in one sample at a specific het site (using rna seq data). I am not sure which of these two equations is correct. Could you please help me
Equation one: alt reads/total reads
Equation two: | 0.5 - alt reads/total reads |
I found equation two in several papers but I don’t know what the 0.5 means
The problem now is that my BAM file shows the the ref and alt reads but this site is completely absent in the VCF :/..Any ideas how I can resolve that? or can I just determine the number of ref and alt reads from the BAM file?
If you have 100 reads overlapping a het SNP, if 25 coming from ref and 75 coming from alt, Allelic Ratio (AR) w.r.t ref allele is ref/total
25/100 = 0.25 . So the deviation is 0.5-0.25=0.25
If there is no allelic imbalance, the ref reads would be 50 and alternate reads would be 50, AR is 0.5 and deviation would be 0.5-0.5=0, i.e no allelic imbalance.
I would just do ref/total to have AR. No need to do (0.5-ref/total). You need to do a binomial test to see the "significant" deviation.
PS: It doesnt matter if numerator is ref or alt count. When you interpret, it should be w.r.t what you use as numerator. Like in gene expression fold change, depending on if you compare condition1 vs condition2 or condition2 vs condition1
Thank you very much for your response @geek_y :)
The problem now is that my BAM file shows the the ref and alt reads but this site is completely absent in the VCF :/..Any ideas how I can resolve that? or can I just determine the number of ref and alt reads from the BAM file?
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized. This comment show go under @geek_y's answer.SUBMIT ANSWER
is for new answers to original question.How did you get that VCF file ? Genotyping array ?
no targeted sequencing