hello
I have an RNA-seq time series experiment (4 time points) where at t=0 the sample gets infected with a virus. I'm interested in checking whether their is any correlation between the kmers and the viral load increase across the 4-time points. I compute the 2-mers up to 7-mers for each fastq file (each RNA-seq sample). Is there any way to go about doing this analysis (trying to see whether there exist a relationship between kmers and viral load)? any help would be greatly appreciated..
One thing to be careful is that statistical tools rely on the count data having certain statistical properties that account for random chance and even sampling.
When counting other quantities such as kmers it is not clear that the space is evenly covered, or that considering the tiny viral genomes that is even remotely well covered. Even though the outcome may be counts the way these counts were produced would matter.
I don't know the answer above and I am just thinking out loud.