Hi all,
I am very new to allele specific expression and I hope someone can help me.
I've been reading a lot in this forum about analyzing allele-specific expression, instead of giving me answers it made me raise a lot of questions instead. Before going to my question, let me give you an overview of my experiment.
So I have two grass species, Species A and Species B. Both species are in the same genus and both are produced seeds through selfing. We crossed Species A (maternal) with Species (paternal). Unfortunately, due to low hybridization success, we didn't get viable seeds from its reciprocal cross. The resulting hybrid, i.e. AB, along with its parental species A and B, we did RNA sequencing. Due to limited budget, we do not have DNA-Seq for any of them.
So I mapped the rnaseq reads of species A to species A reference genome and species B to species B reference genome using STAR and quantified them using featurecounts. Important to note that I used the exact accession numbers of species A and species B with the accession number that is used to create the reference genomes of both species. For AB hybrid, I mapped it to the concatenated reference genome of species A and species B. With this, I know exactly from which parent each gene came from. For this purpose, I want to use the single-copy ortholog between species for allele-specific expression instead of the heterozygote SNV sites.
So my questions are:
- Is this a fair approach? Assuming that the single-copy ortholog between species paired up during meiosis which makes it in a way an allele of a particular gene or ortholog group (?).
- When identifying which ortholog group exhibits allelic imbalance using a binomial test, should I do it using RAW counts or NORMALIZED counts? If the latter, what normalization method is appropriate?
Thank you very much for your expert opinions about this topic.
Best,
Sandy
I agree. Essentially, what we would like to know if the genome of species A or B is favorably expressed over the other. I will look into the link you suggested.
Possibly the correct way to do this - though certainly not easy is:
Pangenomics is a difficult and immature topic in itself though.
This good be another approach. I will also consult my colleague here who is working on pangenomes for this Genus. But do you think the concatenated genome approach is a lackluster approach?