kmers in rna-seq
0
4
Entering edit mode
10.4 years ago
sam ▴ 130

I have a set of kmer counts coming from 2 groups. The first and second group have 25 RNA-seq samples each. I'm interested in identifying kmers that appear to have counts that are different between the 2 groups. In other words, for example, i have the 3mer AAT counts for each sample in both groups. I want to test whether the number of occurrence of this 3mer is significantly different between the 2 groups. Note here that I normalize my data to account for different library sizes in the different samples. Would it be correct to address this problem as trying to test whether the two distribution are significantly different (e.g., test whether the distribution of the 3mer AAT in the first group is significantly different than the distribution of the 3mer AAT in the second group)? In that case I could use a statistical test such as Kolmogorov–Smirnov test or is there a better approach to tackle this problem?

thanks

RNA-Seq kmer • 3.3k views
ADD COMMENT
0
Entering edit mode

Are you expecting a different answer than when you posed a similar question (k-mer analysis in RNA-seq) yesterday?

ADD REPLY
0
Entering edit mode

yes because I don't think we could use DESEQ for this problem given the fact that we are not trying to detect deferentially expressed genes here...

ADD REPLY
1
Entering edit mode

In essence it is the same, though. Doesn't matter what your names are (Gene names or K-mer names). You should go with one of the promimnent tools since you most likely get a distribution which can be modelled by NB and thus using DESeq2, edgeR etc... is the best choice...

ADD REPLY
1
Entering edit mode

The question boils down to asking whether counts, that are likely well described by a negative binomial distribution, are changed by a treatment. DESeq2/edgeR/etc. are just implementations of such a GLM-based testing procedure, so they can still be used.

ADD REPLY
0
Entering edit mode

:) almost simultaneously

ADD REPLY
1
Entering edit mode

I guess the internet latency to Bonn is a bit longer than to Stuttgart :P

ADD REPLY
0
Entering edit mode

Depends on a test, but it may be a good idea to get rid of infrequent kmers -- kmers with frequency 1 may account for a large portion of your kmer set and are a product of seq. errors (as opposed to true biological signal).

ADD REPLY

Login before adding your answer.

Traffic: 2257 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6