I have assembled a new rodent genome and find several genes under expansion (with more gene copies) comparing to human/mouse. I want to check whether the expanded gene show higher expression level in the rodent species comparing to human/mouse (besides, I am also interested in seeing other genes showing differential expression between species). I have several samples of the new rodent species and several for human/mouse from some tissues. I have got the TPM of the gene for all samples by using Kallisto. But I think I can not compare the TPM directly across species, especially, I predicted 26k protein coding genes in the new rodent species, while the protein coding genes are around 20k in human.
Dose anyone have such experiences or have any suggestions. Below is my current ideas. (1) Get the TMP by mapping the RNA-seq data to their corresponding reference genome, respectively. (2) Find gene families (i.e., orthologs between species and paralogs within species) between the new rodent species and human/mouse. I identified about 18,000 gene families with at least one gene copy in both the new rodent species and human. (3) Then I will ignore the genes not included in those gene families. (4) If there are several gene copies in one gene family in a species, I will use the sum of the TPM of the gene copies. (5) Then do quartile normalization on the geneFamily--TPM matrix between the new rodent species and human/mouse. (6) Do differential analysis using Sleuth or DEseq2 or Limma.
Thanks