Quantifying repetitive elements from RNA-seq (hisat2 or Salmon)
2
3
Entering edit mode
6.6 years ago
ywchen ▴ 30

Hi everyone: I am interested in quantifying change in repetitive elements ( LTR here) transcription after treatment and I come up with following ideas:

  1. Directly map RNA-seq data to genome with hisat2 and quantify with repetitive element annotation from Repeatmasker, followed by collecting elements from the same class to compare them. But I am not sure about how to set up maximum allowed multiple alignment value (For most RNA-seq it requires to be uniquely mapped but the value would be much higher since repetitive elements happens lots of times).
  2. I got consensus repetitive element sequence fastq from Repbase, is it possible to view these repeat elements as "transcriptome" and use salmon (or similar transcriptome based tools) to map reads on it?

I am not familiar with this area and I would appreciate any suggestions . Thanks for help!

Update: Since I am only interested in LTR, I have modified the question. It looks possible to extract uniquely mapped reads and combine with Repeatmasker annotation. Direct quantification looks like will fail since repetitive elements are abundant in mRNA.

rna-seq repetitive elements • 5.5k views
ADD COMMENT
6
Entering edit mode
6.6 years ago

You can use STAR and then put though TEtranscript. You can allow multiple entries with STAR and it generally produces better alignments than hisat2 (in my experience at least). Our group that works on repeat elements uses this method.

While you can use the consensus repeat sequence, you end up biasing things for how close the expressed repeats are to the consensus. Consensus sequences are mostly useful for showing a profile over a single instance where you can label structure easily.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. I'm concerned about memory usage by STAR and maybe I will start hisat2 with -k 100 to see if it can be used by TEtranscript tool.

ADD REPLY
0
Entering edit mode

Hi Devon, is this answer still the same, given the Telescope paper? It appears that TEtranscript does not perform well compared to SalmonTE and Telescope. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006453

ADD REPLY
0
Entering edit mode

The answer hasn't changed just because there are now newer methods that may be somewhat more accurate.

ADD REPLY
3
Entering edit mode
6.1 years ago
pdeinin ▴ 30

There is a very important distinction here. Are you interested in transcripts generated by the repetitive elements or that include the repetitive elements. Most repetitive elements are simply passengers in longer RNAs that are expressing genes etc... Only a small portion actually come from the promoters of the repetitive elements. There are a lot of approaches that work generically on repetitive elements, but to understand the transcripts relevant to the life cycle of the repetitive elements takes a very careful approach that maps to specific loci and eliminates the background. This is best described in our paper A comprehensive approach to expression of L1 loci Prescott Deininger Maria E. Morales Travis B. White Melody Baddoo Dale J. Hedges Geraldine Servant Sudesh Srivastav Madison E. Smither Monica Concha Dawn L. DeHaro Erik K. Flemington Victoria P. Belancio Nucleic Acids Research, Volume 45, Issue 5, 17 March 2017, Pages e31, https://doi.org/10.1093/nar/gkw1067

Because of the high background from repeats in genes, our approach only focuses on reads that map to one genomic locus better than any other and eliminates multi-mapped reads. We have found it is also important to have stranded RNA-Seq data and it is better if it comes from cytoplasmic RNA to eliminate as much unspliced material as possible.

ADD COMMENT

Login before adding your answer.

Traffic: 1737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6