How Can I Identify Orthologous Contigs Between Two De Novo Transcriptome Assemblies?
3
4
Entering edit mode
13.1 years ago
Ryan Thompson ★ 3.6k

I am doing de novo transcriptome assembly of RNA-Seq data from two closely-related diploid species (mammals) for the purpose of identifying genetic variations between the two species. In order to do this, I suppose I need to identify pairs of ortholog transcripts between the two assemblies, so that I can compare them. What is the best way to do this? Should I simply do all pairwise alignments and pick out the pairs that are best matches to each other? Are there tools available for this already?

Additionally ,how does the presence of heterozygous SNPs affect the strategy? I am using Trinity for the transcriptome assembly, and my understanding is that when a transcript has a heterozygous SNP, Trinity will end up reporting two complete contigs that are identical except for the SNP. For example, if the transcript is "TTTTTTTTTT" and there is a heterozygous A/T at position 6, then Trinity would report "TTTTTTTTTT" and "TTTTTATTTT". This could potentially complicate the identification of ortholog pairs by a "mutual best match" strategy described above.

transcriptome assembly denovo trinity orthologues • 5.8k views
ADD COMMENT
1
Entering edit mode
13.1 years ago

[?]OrthoMCL[?] is a popular package used to find orthologous groups.

ADD COMMENT
1
Entering edit mode
13.1 years ago

I am not familiar with Trinity, but would suggest looking into relaxing its stringency for separating the A and T transcripts in your example so that these are reported as alleles of one transcript. That's one goal of the output, right?

If that cannot be done, then you need to be able to identify one member of the heterozygous transcript pair as such and remove it from the orthologous gene-finding step. You can do this with BLAST via mutual best hit, or with other tools such as OrthoMCL. If both organisms have (reference) genome sequence, then these assignments likely have been calculated already. Thus, when there is a heterozygous genotype, one transcript is set aside but labeled as an alternate allele/genotype/haplotype for a given gene/transcript and the other allele is used as query in the ortholog search.

You will also need to develop a strategy to deal with the transcripts that have no orthologous match. Are these unique to one species? Is there really an ortholog but which is not expressed or not detected in your data? Will you have situations of gene A in species A being orthologous to genes A1 and A2 in species B? In many of these cases, you may still be able to detect variants in the transcripts with the 1:1 ortholog relationship.

ADD COMMENT
1
Entering edit mode
13.1 years ago
Vitis ★ 2.6k

In terms of the heterozygosity, I think it's a much more complicated problem than orthologous gene identification. If it's genomic sequencing, there is a way to see level of heterozygosity by checking k-mer frequency distribution. This is well implemented in Quake. But transcriptome has inherently uneven coverage, which can span very big range. So this method is not really working for transcriptome. In this situation, the best solution I can think of is using contig assembly software Phrap or cap3 to assemble the de novo contigs, hope the mismatches allowed there would capture the heterozygosity. And if your two species are close enough evolutionarily, you may just do de novo assembly for one, followed by contig assembly, or even reference-protome based improvement, then map the reads from the other organism on top of the first one.

ADD COMMENT
0
Entering edit mode

Phrap and cap3 are good choices to assemble contigs. You would not, in my mind, want to mix reads from different species for mRNA assembly and ortholog identification.

ADD REPLY
0
Entering edit mode

I didn't made myself clear. What I meant was to use de novo and phrap/cap3 to assemble just ONE of them, making a solid reference transcriptome. Then if the other one is close enough, maybe it's feasible to map the reads using the first one as reference.

ADD REPLY
0
Entering edit mode

Yes, a much clearer approach. I could agree to give that a try.

ADD REPLY

Login before adding your answer.

Traffic: 1150 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6