Hi everyone,
I am trying to find allele specific expression from RNA-seq data. I have the results from GATK ASEReadCounter
and the output looks like this:
contig position variantID refAllele altAllele refCount altCount totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
chr2 38021 rs113895774 C T 0 1 1 0 0 1 0 0
chr2 217334 rs6709534 G C 0 3 3 0 0 3 0 0
chr2 218386 rs9213 G A 84 101 185 0 0 185 0 0
chr2 220889 rs3828165 G A 9 7 16 0 0 16 0 0
chr2 221560 rs60484953 G A 11 11 22 0 0 22 0 0
chr2 221981 rs3791224 C T 3 4 7 0 0 7 0 0
chr2 222336 rs3791223 T C 3 8 11 0 0 11 0 0
chr2 224086 rs1474053 T A 3 2 5 0 0 5 0 0
chr2 224919 rs2290911 A G 55 16 71 0 0 71 0 0
I had an impression that this tool implements some test(s) to find statistically significant sites. But if I am not wrong, it only calculates the counts of ref and alt alleles based on RNA-seq - more or less like bam-readcount
or samtools mpileup
followed by counting bases.
In the manual of ASEReadCounter, they say that you can use the output format as input to mamba but I am not sure looking at the input format for mamba which requires an additional field: EXON_INFO - variant annotation label. Alternatively, I am thinking of using the refCount, altCount and totalCount, perform a chi square test
to determine if there is allelic imbalance or not.
I would like to get suggestions on how to analyze this output or what downstream methods/statistical tests to use. Any help would be much appreciated.
Thanks!
@komal.rathi I want to know how you analyzed it last. For I can't download mamba.
I ended up doing a chisq test using the output of ASEReadCounter.