Dear colleagues, I work with RNA-seq data of two non-model close species. For each species, I received 3 transcriptome from different life history phases, each in two biological replicate. Previously I have found orthologous sequences between two species, and now I want to determine which of the orthologues are differentially expressed. In addition, a number of programs require the presentation of data in the form of FPKM. I had some questions:
1) If I understand everything correctly, programs such as DESeq2 or edgeR "assume" that most of the genes between the compared samples are not differentially expressed. And they consider one of the samples as "control". Which strategy works best when there is no "control sample" when comparing orthologue sequences expression level in two species? Is it worthwhile to use each species as a "control" in turn?
2) I am also interested in detecting differentially expressed genes between two different phases of a single life history. What is the best way to conduct the analysis if I assume that more than 50% of the genes should work differently between the samples?
3) if I'm not mistaken, it is impossible to compare the FPKM of different samples directly. One of the reason, from my point of view, that at the last stage when we calculate this values, we divide count to the length of assembled sequence. But the length of orthologues sometimes may be different. And I want to ask the next question: If orthologues differ in size, can we used length of alignment of sequences as “per kilobase” factor, when we calculate FPKM? I apologize if similar questions were asked, but I did not find answers when searching. It should be noted that there are no assembled genomes from the two species under study. I will be grateful for advice, opinions or links to articles.
Thank you