I am working with paired-end reads and I have to now calculate FPKM. My question is, how to calculate FPKM. wc -l mcf7_1.fastq gave me 26723524 reads and wc -l mcf7_2.fastq gave me 26723524 reads as well (I divided the wc -l output by 4), but for FPKM, do I need to take only 26723524 fragments or 53447048?
You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:
The Total Count and RPKM [FPKM] normalization methods, both of which are
still widely in use, are ineffective and should be definitively
abandoned in the context of differential analysis.
The first thing one should remember is that without between sample
normalization (a topic for a later post), NONE of these units arecomparable across experiments. This is a result of RNA-Seq being arelative measurement, not an absolute one.
It is not possible to calculate FPKM from raw fastq files. You need to align the data to a reference genome. Then you'll get a BAM file. There are many tools out there to calculate FPKM using BAM files.
Thanks Venu, but that part I know. I need to know what value to take to calculate the number of fragments. I have also seen the video in the link provided, but it only explains RPKM and TPM.
An update (6th October 2018):
You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:
Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units