Aligning Rna-Seq To Repetitive Line-1 Elements
5
2
Entering edit mode
11.9 years ago
rd ▴ 20

Hello,

I would like to check whether L1 repetitive elements are modulated between my treatment and control via RNA-Seq. I have read several papers that have done so but their methods are not clear enough for me as a biologist to reproduce. I have analyzed my data using the Tuxedo suite and have analyzed the "unique" genes. I am wondering what modifications have to be taken into account to accommodate the repetitive nature of LINEs. 1- I have an understanding that some aligners filter out reads that map to several places in the genome. Are my LINE reads being filtered out by tophat? 2- If so, how do I align them? 3- when using cufflinks, intead of using RefSeq, I am assuming I would have to use a repetitive element model?

Thank you!

rna-seq • 8.7k views
ADD COMMENT
3
Entering edit mode
11.9 years ago
Ryan Dale 5.0k

You might get some ideas from a solution described in a paper from Peter Park's lab, Estimating enrichment of repetitive elements from high-throughput sequence data which has an online tool available, with source code (Repeat Enrichment Estimator). It appears to be for ChIP-seq though; not sure how adaptable it would be for RNA-seq.

Edit on Apr 27 2015:

I recently had to revisit this problem and found a useful tool that didn't exist at the time of the original answer:

RepEnrich (paper, github)

ADD COMMENT
1
Entering edit mode
11.9 years ago
biorepine ★ 1.5k

If I understand your question correctly, you want to identify the expression levels of of LINE-1 repeats in your RNA-Seq samples? If that is the case follow these instructions.

  1. Make a GTF format file your repeat elements or download them from UCSC/Galaxy and run
  2. tophat -G LINE1-repeats.gtf -o treat-rnaseq yourgenome_ebwt_base treat-rnaseq.fastq
  3. tophat -G LINE1-repeats.gtf -o control-rnaseq yourgenome_ebwt_base control-rnaseq.fastq
  4. cuffdiff -G LINE1-repeats.gtf treat-rnaseq.bam control_rnaseq.bam

Step 2 and 3 do the map the RNA-Seq reads to your repeat elements in the genome.

Step 4 calculates the differential expression of your repeat elements in your treatment and control.

ADD COMMENT
0
Entering edit mode
11.9 years ago

You would probably want to restrict your analysis to LINE elements regions that have sites variant with respect to the consensus LINE sequence. That way you would consider reads that map uniquely to your region of interest.

ADD COMMENT
0
Entering edit mode
9.6 years ago
Manvendra Singh ★ 2.2k

I would not suggest to go for tophat, because there are hardly any splice variants for L1 elements. so tophat would also map reads on chimerae and exonized L1 elements

I would go for bowtie

I would allow many mismatches but one allignment per read with --best option

It always works for me

ADD COMMENT
0
Entering edit mode
8.4 years ago
ghv8 • 0

Hi All, It is my understanding that it is error-prone to map repetitive sequences. Is this something that tophat2 can take care of by simply tweaking the parameters? for example, I could set the -N/--read-mismatches to 0. Or is there something more 'fancy' that needs to be done?

Also, is it worth it to pay more and do paired-end sequencing to be more accurate in mapping to repetitive regions? Thanks for the advice -G

ADD COMMENT
0
Entering edit mode

Please make a new post to ask this question (and consider deleting this post). That will give you a much better chance of getting a response.

ADD REPLY

Login before adding your answer.

Traffic: 1606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6