Hello Everyone,
I am doing allele-specific expression analysis. To control the mapping bias, I am mapping my 150 bp reads to both the genomes (reference and non-reference). I have 3 biological replicates for my sample. I mapped individual replicate to both the genomes and merged the bam files for 3 replicates (separately for each genome). Now from two merged bam files (one for each genome), I assigned the reads to each genome based on the SNP present bw the two genome (used pysam for this step).
From this data what is the best way to fetch expressed genes? If I use FPKM only from the reference genome then I will lose the gene that are uniquely expressed from the non-reference genome.
Thanks,RT
Based on my interpretation of your question, I would be very surprised if the gene-level FPKM values were very different between the two genomes. The two genomes should only differ at a number of single nucleotide variants, right? If each aligned read can only map to a single genome, then you should be able to add the counts of reads/fragments falling into gene A in the reference and non-reference genome to give you an estimate of total expression levels. I think you are performing a very interesting analysis and I would like to discuss in more detail if you would like to.
I would suggest to use something like allim (expressly dedicated to allele specific expression) to perform your analysis. My preliminary results with that are encouraging.