Question

kmers in rna-seq

4

Entering edit mode

10.3 years ago

sam ▴ 130

I have a set of kmer counts coming from 2 groups. The first and second group have 25 RNA-seq samples each. I'm interested in identifying kmers that appear to have counts that are different between the 2 groups. In other words, for example, i have the 3mer AAT counts for each sample in both groups. I want to test whether the number of occurrence of this 3mer is significantly different between the 2 groups. Note here that I normalize my data to account for different library sizes in the different samples. Would it be correct to address this problem as trying to test whether the two distribution are significantly different (e.g., test whether the distribution of the 3mer AAT in the first group is significantly different than the distribution of the 3mer AAT in the second group)? In that case I could use a statistical test such as Kolmogorov–Smirnov test or is there a better approach to tackle this problem?

thanks

RNA-Seq kmer • 3.3k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by sam ▴ 130

0

Entering edit mode

Are you expecting a different answer than when you posed a similar question (k-mer analysis in RNA-seq) yesterday?

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

yes because I don't think we could use DESEQ for this problem given the fact that we are not trying to detect deferentially expressed genes here...

ADD REPLY • link 10.3 years ago by sam ▴ 130

1

Entering edit mode

In essence it is the same, though. Doesn't matter what your names are (Gene names or K-mer names). You should go with one of the promimnent tools since you most likely get a distribution which can be modelled by NB and thus using DESeq2, edgeR etc... is the best choice...

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Phil S. ▴ 700

1

Entering edit mode

The question boils down to asking whether counts, that are likely well described by a negative binomial distribution, are changed by a treatment. DESeq2/edgeR/etc. are just implementations of such a GLM-based testing procedure, so they can still be used.

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

:) almost simultaneously

ADD REPLY • link 10.3 years ago by Phil S. ▴ 700

1

Entering edit mode

I guess the internet latency to Bonn is a bit longer than to Stuttgart :P

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

Depends on a test, but it may be a good idea to get rid of infrequent kmers -- kmers with frequency 1 may account for a large portion of your kmer set and are a product of seq. errors (as opposed to true biological signal).

ADD REPLY • link 10.2 years ago by Lynxoid ▴ 230