Hi,
I am working on microbial genomics, and there are a couple of datasets in SRA/ENA that I can use for my work. I want to combine these datasets in a single study but the problem is these datasets are all done on different subspecies of S. aureus.
I tried creating a common reference genome annotation according to a methodology by LoVerso and Cui - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4668955/, but it mapped only around 200 genes in common between two species.
Is there any other method wherein I can map the homologs of a second genome to the primary reference annotation and carry out an integrated analysis in a single go? (Because that creates many numbers of replicates under the same condition and increases the reliability of the studies)
Or am I supposed to do separate differential expression for each dataset and then compare the obtained genes separately?
Hi, thank you for your reply. But I doubt if I want to create a new consensus reference genome and carry out the analysis - in that case will all the corresponding genes be mapped correctly?
My aim is to do an integrative analysis of certain public RNA-seq data available for a particular bacterial species. But each experiment are done in different strains/subspecies.
What I plan to do is to align the reads to their respective reference genomes, and for further analysis, create an annotation file (GFF/GTF) - based on one of the selected subspecies (chosen "target" for lift over) and combine it with the mapped annotation of other subspecies ("source" for lift over).
Is this procedure right? Or are there any other alternatives? I do not wish to do all RNA-seq analysis separately and then simply compare the results of differential expressed gene lists.