Filtering by FPKM, opinions and thoughts
2
3
Entering edit mode
8.7 years ago
Biogeek ▴ 470

Hi,

In terms of FPKM filtering why do people carry out this process? May be an obvious question, but was does it ultimately achieve?

-Is it to remove lowly expressed possible contaminating reads from other organisms which may live within the same environment? -Is it to remove 'background noise'

If someone where to carry out FPKM filtering, how does one decide a threshold. Should it be density plot of FPKM of each sample used in assembly?

Lastly, I have seen values of 0, 0.3, 1 and 1.5 FPKM being used as a threshold. Is this arbitary or do people select based on a certain parameter or decision in the data?

Would be keen to see what people think, and also the information people can provide on the matter.

Thanks.

fpkm filtering • 4.3k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

There are a plethora of different reasons that people do this, but most commonly they're trying to get just "expressed" genes. The actual thesholds are essentially arbitrary and will vary with every experiment (so don't blindly use a reported threshold). To derive one of these, either compute zFPKMs or plot the FPKM distribution and visually choose a reasonable value. Alternatively, don't use FPKMs and don't bother with this whole process unless you trully need to.

ADD COMMENT
0
Entering edit mode

Hi ryan, Thanks for the suggestion. That was interesting as I too have the same question. Good to know about the zFPKM method. So I used the script available online "https://github.com/severinEvo/gene_expression/blob/master/zFPKM.R" After computing zpkm for every transcript, it once again gave me the output with values ranging from-3 to 8. Now from these zfpkm values how to find the threshold. Kindly guide me, if I misunderstood anything. Thanks in advance.

ADD REPLY
1
Entering edit mode

The zFPKM paper recommended a threshold of -3 (see table 1 in the paper). Perhaps the script does filtering for you, I've never used it.

ADD REPLY
0
Entering edit mode
8.7 years ago

For me, there are two main reasons for why I want to dismiss lowly or un-expressed genes :

  • First it reduces the memory requirement for subsequent analysis and increase its speed.
  • Secondly, if one carry out differential expression analysis, there are good chances to lack power to find significant difference for lowly expressed genes. So if one remove them before testing, the multiple testing correction (FDR) will be less stringent on "truly expressed" genes and the detection power increases.

Is it to remove lowly expressed possible contaminating reads from other organisms which may live within the same environment?

I don't really think that this is the first motivation but why not.

If someone were to carry out FPKM filtering, how does one decide a threshold ?

Not sure about FPKM. However DESeq2 does a similar filtering on low counts genes (counts are another metric of gene expression). They explain their method in detail in the sections 3.8 and 4.7 of their manual. Look at figure 12 to see how they decide on the treshold.

ADD COMMENT

Login before adding your answer.

Traffic: 2340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6