Hello,
I am studying the gene expression of a species that has undergone a duplication event. I have a synteny table of gene duplicates for multiple tissue types, which was derived using the genome of a related ancestral species (that existed prior to the duplication event).
I want to identify loci where the duplicates have significantly different expressions - I was wondering if I could use DESeq2 to do this. In particular, I was going to set up a table with samples consisting of all tissue x duplicate pairs that looks as follows:
Locus_id | t1_d1_r1 | t1_d1_r2 | t1_d1_r3 | t1_d2_r1 | t1_d2_r2 |t1_d2_r3 | t2_d1_r1 | t2_d1_r2 | t2_d1_r3 |....
Here t denotes the tissue type, d denotes the duplicate (corresponding to subgenomes 1 and 2) and r indicates one of three replicates. I was then considering constructing a design matrix that can identify differentially expressed loci - for example conducting a log ratio test to see if the duplicate factor is significant in the design.
My question is whether this violates assumptions of deseq2 framework. I assumed that because the gene pairs are duplicates, it is okay to determine the means and dispersion estimates for each gene pair.
Any feedback on this is much appreciated.
Thanks!!