Hi friends, how are you? I need your help with my study project. I have RNAseq data for 3 species (6 replicates per species), species form a genus within bedbugs. I would like to analyze the differential expression of these species, however I have doubts if the answers I get are real or methodological biases. At the moment I am using all assemblies (18 = 6 per species) to map the expression and I have interesting results, however I don't know if it could be technical bias.
On sequencing they were sequenced in the same batch and the conditions were the same. My doubt is that because I don't have a genome or transcriptome as a reference, I may be obtaining "non-real" data about the expression. I chose to use all assemblies (n=18 (6 per species)) to obtain the reference and analyze against this "super reference". My collaborators are unsure about the results, however the methodology is consistent with "good practices".
I would like to be certain that I could continue with this study.
#build reference with all assemblies (n =18)
kallisto index -i reference ( all assemblies)
#analysed by sample (pairend)
kallisto quant -i reference.idx -o output --rf-stranded -b 100 r1.fasta r2.fasta
## estimetes
Trinity/util/abundance_estimates_to_matrix.pl \
--est_method kallisto --gene_trans_map reference.fasta.gene_trans_map \
--name_sample_by_basedir --cross_sample_norm TMM --out_prefix outdir \
sample1, sample2 ...sample18
Trinity/Analysis/DifferentialExpression/run_DE_analysis.pl --matrix gene.counts.matrix --method edgeR --output out --dispersion 0.1
Trinity/Analysis/DifferentialExpression/analyze_diff_expr.pl --matrix gene.TMM.EXPR.matrix --max_genes_clust 1000000 -P 1e-3 -C 4
Trinity/Analysis/DifferentialExpression/define_clusters_by_cutting_tree.pl -R / diffExpr.P1e-3_C4.matrix.RData --Ptree 60!
I'm not sure to understand what you are trying to do. To make a differential gene expression analysis, one needs to compare the expression of genes between conditions. The issue is that here, since you study different species – that presumably do not have the same set of genes – it does not make sense to compare gene expression.
In this case I compare different species under the same conditions, (development stage, environment, etc.).
yes, but since they do not have the same genes... you are basically trying to compare apple and banana.
Inter-species is really not my field but intuitively I would have the following ideas / see these obstacles:
There are probably additional factors to consider here. Be sure to read literature on this and about specific tools to do this. Others have probably done it before, maybe there are dedicated approaches for it. Be sure to not reinvent the wheel. I doubt that a wrapper script such as the one in Trinity will do here for the reasons mentioned above.
Very good, as we don't have a reference to date, it would be wrong for me to just determine the orthologs and evaluate the expression from the orthologs. An important point are very closely related species and form a genus of only 3 species that have probably evolved recently through hybridization.
Hi, I basically want to do a similar thing - comparing gene expression levels between different species and finding significance.
Here's something about normalization: https://www.researchgate.net/publication/352770739_A_scaling-free_minimum_enclosing_ball_method_to_detect_differentially_expressed_genes_for_RNA-seq_data
Generally, I am interested to hear how did it go with you since I am a beginner in bioinformatics and still in the steap learning process so at the moment I am trying to find as much information relevant to this topic as possible :)