Question

How to calculate FPKM from the reads mapped to two genomes

0

Entering edit mode

9.7 years ago

GR ▴ 400

Hello Everyone,

I am doing allele-specific expression analysis. To control the mapping bias, I am mapping my 150 bp reads to both the genomes (reference and non-reference). I have 3 biological replicates for my sample. I mapped individual replicate to both the genomes and merged the bam files for 3 replicates (separately for each genome). Now from two merged bam files (one for each genome), I assigned the reads to each genome based on the SNP present bw the two genome (used pysam for this step).

From this data what is the best way to fetch expressed genes? If I use FPKM only from the reference genome then I will lose the gene that are uniquely expressed from the non-reference genome.

Thanks,RT

FPKM RNA-seq allele-specific-expression • 2.7k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.7 years ago by GR ▴ 400

2

Entering edit mode

Based on my interpretation of your question, I would be very surprised if the gene-level FPKM values were very different between the two genomes. The two genomes should only differ at a number of single nucleotide variants, right? If each aligned read can only map to a single genome, then you should be able to add the counts of reads/fragments falling into gene A in the reference and non-reference genome to give you an estimate of total expression levels. I think you are performing a very interesting analysis and I would like to discuss in more detail if you would like to.

ADD REPLY • link 9.7 years ago by lkmklsmn ▴ 980

0

Entering edit mode

I would suggest to use something like allim (expressly dedicated to allele specific expression) to perform your analysis. My preliminary results with that are encouraging.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.7 years ago by Fabio Marroni ★ 3.0k