Question

How does Cuffdiff calculate the p-values from Jensen-Shannon divergence - sqrt(JS) ?

1

Entering edit mode

8.7 years ago

clausndh ▴ 60

Dear all Biostars users,

I updated the question

we try to understand the way how Cuffdiff calculate the p-values from Jensen–Shannon divergence between isoforms. Therefore I attached the 'critical' part of the cuffdiff code.

As far as I understand the the code, they sample FPKM values from all FPKM values of the first condition ('PREV')

For example, we have 2 isoforms with 3 replicates each

      Prev1 Prev2 Prev3  Curr1 Curr2 Curr3
Iso1   p11  p12    p13     c11  c12   c13
Iso2   p21  p22    p23     c21  c22  c23

and if I got the comment

// Draw k values, from prev's fpkm_samples to make the first half of the null, where k is the number of replicates in curr, and compute their average

correct. Then they sample k=3 times one of the columns Prev1, Prev2 or Prev3 . Is this correct ?

Further, I'm not sure that they are doing in this row :

curr_set_sample += prev_abundance.member_fpkm_samples()[next_sample_idx] / (double)curr_abundance.rg_props().size();

Afterwards, they just calculate the mean of sampled data. And out of mean they calculate the relative amount of the isoform at the total gene expression.

Thanks a lot for your help.

Github Code of JSD method of Cuffdiff [ https://github.com/cole-trapnell-lab/cufflinks/blob/753c109e31818dcf7aa8a2c8ecdac4fa43d2ab9b/src/differential.cpp#L558 ] Line 619 to 638

vector<double> null_js_samples;

static const size_t num_null_js_samples = 10000;

boost::random::mt19937 rng(random_seed);

boost::random::uniform_int_distribution<> prev_sampler(0, prev_abundance.member_fpkm_samples().size()-1);

boost::random::uniform_int_distribution<> curr_sampler(0, curr_abundance.member_fpkm_samples().size()-1);

for (size_t i = 0; i < num_null_js_samples; ++i){

    // Draw k values, from prev's fpkm_samples to make the first half of the null, where k is the number of replicates in *curr*, and compute their average

    Eigen::VectorXd curr_set_sample = Eigen::VectorXd::Zero(curr_abundance.abundances().size());

    for (size_t k = 0; k < curr_abundance.rg_props().size(); ++k){

        int next_sample_idx = prev_sampler(rng);

        if (next_sample_idx >= 0 && next_sample_idx < prev_abundance.member_fpkm_samples().size())

            curr_set_sample += prev_abundance.member_fpkm_samples()[next_sample_idx] / (double)curr_abundance.rg_props().size();

    }

    double total_curr_set_sample_fpkm = curr_set_sample.sum();

    if (total_curr_set_sample_fpkm > 0.0)
        curr_set_sample /= total_curr_set_sample_fpkm;

RNA-Seq Cuffdiff Cufflinks sqrt(JS) • 2.6k views

ADD COMMENT • link 8.7 years ago by clausndh ▴ 60