Question

Calculate strand ratio for repeat (ALU) elements

0

Entering edit mode

4.4 years ago

A. Domingues ★ 2.7k

I am trying to quantify/estimate the amount of double stranded ALU elements to compare two conditions. My very coarse approach was to align the reads to repeat element sequences, and then summarize how many reads map sense or antisense to each element. If I summarize all repeat elements, there is a bias to sense-mapping reads (~1.3). However, in the ALU elements, the ratio is nearly 1 with a shift in one of the conditions - which matches the experimental hypothesis.

The issue is that due to the repetitive nature of these elements I am having second thoughts about if this approach is at all valid.

Briefly, my approach to calculate strand bias in repeat elements:

rRNA depleted RNA-seq, stranded library
reads were mapped to transposable element sequences (derived from repeatmasker, one contig = one element, example below) with STAR, keeping one random alignment for any read that maps up to 100 locations
Alignments in each repeat element sequence were then counted with the Bioconductor package Rsamtools with the following setting:
- repeat elements were considered those whose name doesn't match "^5S|^7S|_n$|rRNA|^tRNA|^U[0-9]|^RNA"
- only proper read pairs were counted
- alignments in the forward stand isFirstMateRead = TRUE, isMinusStrand = TRUE
- alignments the reverse strands isFirstMateRead = TRUE,isMinusStrand = FALSE
A ratio of sense /antisense reads was then calculate for each repeat element sequence.

Does this make sense at all? Is there a better way of doing it?

grep "AluJb" -A 100 all_repeats.hg38.fa | head -20
>AluJb
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNTCCATAAGAATGGAAAGAAAACATGGCCAGGTGCAGTGGC
TCACACCTGTAATCCCACCACTTCAGGAGGCTGAGGCAACATGGCAAAACCTTCTCTTCA
AAAAATTTTTTAAAAGTTAGCTGGATGTTGTGGAGGCAAGAGGATCACTTGAGGATCACT
TGAGTCCATGAGGTCAAGGCTGCAGTGAGTCATGTTTGCACCACTGCACTCTAGCCTAGG
TGACAGAGCTAGTCACTATCAAAAAAAAAAAAAAAAGAATGGAGAGAATGCTACATGAGA
GAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNATAGATTTTTTTAAAAAGAAAACTGGCCAGGTACT
GTGGCTTATGTCTGTAATATCAGCATGTTGGGAGGCCAAGGCAGGATTACTTGAGCCCAG
AAATTCCAGACCAGCCTGAGAATTTGGCAAAACTCTGTCTCTACAAAAAATACAAAAATT
AGCCAAGTTTGGTGGCATGTGCCTGTAGTACCAGCTACTTGGGAGGCTGAGGTGGAAGAA
TAGCTTGAGTCTGGGAGGTCAAGGCTGCAATGAGCTGTGATTGCACCACTGCACTCAAGC
CTGGGTGGTAGAGTAAGACCCTGTCTCAAAAAAAAAAAAAAAAAAAGAAAAATCACTAAG
CAAAATAAGACATGTGAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

ALU RepeatMasker STAR Transposons • 1.2k views

ADD COMMENT • link 4.4 years ago by A. Domingues ★ 2.7k

score 0 · Answer 1 · 2020-11-12

0

Entering edit mode

4.4 years ago

A. Domingues ★ 2.7k

I shared this question on twitter and it seems that this approach appears to be ok. A few suggestions as alternative/ complementary approaches :

use repBase (or consensus) sequences instead of repeatmasker
map to genome and intersect with annotated TE coordinates.

ADD COMMENT • link 4.4 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

If no one adds a valid answer in the coming days I will accept my own answer.

ADD REPLY • link 4.4 years ago by A. Domingues ★ 2.7k