Allele-specific expression in interspecific hybrids
1
2
Entering edit mode
9 months ago
Sandy ▴ 20

Hi all,

I am very new to allele specific expression and I hope someone can help me.

I've been reading a lot in this forum about analyzing allele-specific expression, instead of giving me answers it made me raise a lot of questions instead. Before going to my question, let me give you an overview of my experiment.

So I have two grass species, Species A and Species B. Both species are in the same genus and both are produced seeds through selfing. We crossed Species A (maternal) with Species (paternal). Unfortunately, due to low hybridization success, we didn't get viable seeds from its reciprocal cross. The resulting hybrid, i.e. AB, along with its parental species A and B, we did RNA sequencing. Due to limited budget, we do not have DNA-Seq for any of them.

So I mapped the rnaseq reads of species A to species A reference genome and species B to species B reference genome using STAR and quantified them using featurecounts. Important to note that I used the exact accession numbers of species A and species B with the accession number that is used to create the reference genomes of both species. For AB hybrid, I mapped it to the concatenated reference genome of species A and species B. With this, I know exactly from which parent each gene came from. For this purpose, I want to use the single-copy ortholog between species for allele-specific expression instead of the heterozygote SNV sites.

So my questions are:

  1. Is this a fair approach? Assuming that the single-copy ortholog between species paired up during meiosis which makes it in a way an allele of a particular gene or ortholog group (?).
  2. When identifying which ortholog group exhibits allelic imbalance using a binomial test, should I do it using RAW counts or NORMALIZED counts? If the latter, what normalization method is appropriate?

Thank you very much for your expert opinions about this topic.

Best,
Sandy

RNA-seq allele-specific-expression interspecific-hybrids • 709 views
ADD COMMENT
1
Entering edit mode
9 months ago

This is a very tricky topic. I'm not convinced that most analysts do this correctly.

I can't say much about cross species allele specific expression. Most ASE work has revolved around one species to my knowledge.

The ideal approach would be to have a haplotype resolved assembly - ideally complete - and then map short RNA reads to both (perfectly annotated) haplotypes. A more typical approach has been to add variation from both samples (SNPs) into the common reference genome - since SNPs cause allelic bias in read mapping, and then map read samples to those pseudoreference(s).

Tricky - yes. Doable - maybe.

I liked the approach of this tool very much when I did some work on this some time ago.

https://github.com/secastel/phaser

For your example I think you need to read the literature deeply to find comparable problems and follow their methods.

ADD COMMENT
0
Entering edit mode

I agree. Essentially, what we would like to know if the genome of species A or B is favorably expressed over the other. I will look into the link you suggested.

ADD REPLY
1
Entering edit mode

Possibly the correct way to do this - though certainly not easy is:

  • create a pangenome out of the two species using minigraph-cactus
  • use vg giraffe or the vg rna mapper (mpmap)? to align rna reads to the pangenome
  • write out the read counts continue from there.
  • or project reads back to one/both of the references so you have access to gene annotation again.
  • continue as usual

Pangenomics is a difficult and immature topic in itself though.

ADD REPLY
0
Entering edit mode

This good be another approach. I will also consult my colleague here who is working on pangenomes for this Genus. But do you think the concatenated genome approach is a lackluster approach?

ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6