Question

Determining transcripts for repetitive elements in RNA-seq data

0

Entering edit mode

2.7 years ago

rodd ▴ 250

Dear all, I'm not sure if this is a vague question, but here we go. I've got an annotation of non-protein-coding gene transcripts made up of repetitive elements. I'd like to check whether any of the predicted transcripts can be identified as "real" transcripts (in other words, I would expect that the gene annotations have multiple RNA-seq reads aligning to them, to an extent that a bioinformatic tool defines these putative transcripts as effective transcripts). I've got access to 500 RNA-seq samples which I can use to do this. What would be the best approach to do this, considering I'd like to allow these "new sequences" to be detected alongside regular protein-coding genes?

Something I started piecing together was as follows: simply cat forward / reverse reads together from all 500 samples. Align reads using hisat2. Run stringtie to identify transcripts using my annotation (of protein-coding genes + new genetic features) as "reference" annotation.

What would you do?

transcript rna-seq stringtie • 973 views

ADD COMMENT • link updated 2.7 years ago by rpolicastro 13k • written 2.7 years ago by rodd ▴ 250

0

Entering edit mode

How long are these repetitive elements, and are there SNPs with strong confidence in each repeat?

ADD REPLY • link 2.7 years ago by rpolicastro 13k

0

Entering edit mode

The putative genes in this annotation vary in length, from 100-10000+ bp. They do have strong confidence SNPs in them. There is a program that quantifies expression of these features in RNA-seq data, based on these genetic differences (Telescope). This program is also the source of this annotation of putative transcripts.

ADD REPLY • link 2.7 years ago by rodd ▴ 250

0

Entering edit mode

Just to clarify, you want to check whether any of these repetitive elements have appreciable expression in your cohort of 500 RNA-seq samples? I've not heard of Telescope before, but it looks like it takes a similar approach to resolving multimappers as the more popular alternatives such as Salmon. If Salmon or Telescope can resolve the multimappers based on the SNPs, and there is an appreciable estimated abundance, I would probably be confident enough to say they are expressed. Perhaps Rob can add to this.

ADD REPLY • link 2.7 years ago by rpolicastro 13k