RAST annotation of RNA-seq transcriptome results in some genes appearing more than once
1
0
Entering edit mode
14 months ago
langziv ▴ 70

Hi

I'm guessing that this is due to gene homology. I'm trying to create heat maps in R, which requires that each entree (a row for each gene, with gene copy counts from all replicates) will be unique, so such duplicate gene names cause a problem.

I'd be happy to get suggestions on what should be done in such a case.

R RNA-seq • 736 views
ADD COMMENT
1
Entering edit mode

I presume what you've encountered are either paralogs or gene isoforms.

There are two approaches you can try to get rid of this "redundancy" for the purpose of getting a single gene-equivalent.

1) Cluster the transcriptome at some sequence identity threshold (e.g., 90% coverage by the longer sequence over the shorter sequence) using a tool like MMseqs2 or CD-HIT.

2) If the transcriptome is de novo assembled using an assembler like Trinity, you can take advantage of the gene-isoform relationships indicated in the sequence headers to retain one isoform per gene "cluster".

Technically you could apply both options together also (first option 2 then 1 in this case).

In any case, you can sum up the counts for each set of sequences now represented by your chosen sequence and simply assign those counts to it. These can then be supplied to your heatmap plotting function.

ADD REPLY
1
Entering edit mode
14 months ago
Roberto ▴ 20

As always, depends on what is your research question. If you are interested in "functionality" or taking the analysis at gene level, then feel free to sum up the reads from the transcripts. If you are interested in keeping the transcripts separated, just rename them? Depending on the organism and the identity between the paralogs, the latter might make more sense anyway, given that different transcripts might still function differently (eg due to mutation or differences in promoters, splicing and what not).

ADD COMMENT

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6