Question

Processing Tandem Repeats Finder (Trf) Output For Downstream Motif Analysis

0

Entering edit mode

12.2 years ago

Paul ▴ 20

I have used Tandem Repeats Finder (TRF) for tandem repeat search in my fasta files.
Output looks like this:

Sequence: ENSG01

Parameters: 2 5 7 80 10 50 2000

1053 1139 4 22.2 4 67 2 62 28 4 48 18 1.68 GAGT GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG
1069 1137 20 3.6 19 74 7 71 28 5 47 17 1.70 GAGTGAGTGAGCCAGGAGT GAGTGAGTGAGCCAGTGAATGAGTGAGTG
1619 1746 8 16.8 8 65 9 60 27 1 52 18 1.55 GAGT GAGTGAGTGAGTGAATGAGTGAATGGGAGT

Sequence: ENSG02

Parameters: 2 5 7 80 10 50 2000

Example explanation:

Sequence (ENSG01) - fasta name
Column 14 (GAGT) - repeat unit
Column 15 (GAGAGAGTGGG) - repeat sequence

Help I need:

How to process such file:

Remove sequences that don't have repeats (like ENSG02) in them?

Combine fasta name with the following repeat data?

For the output like this:

ENSG01
1053 1139 4 22.2 4 67 2 62 28 4 48 18 1.68 GAGT GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG
1069 1137 20 3.6 19 74 7 71 28 5 47 17 1.70 GAGTGAGTGAGCCAGGAGT GAGTGAGTGAGCCAGTGAATGAGTGAGTG
1619 1746 8 16.8 8 65 9 60 27 1 52 18 1.55 GAGT GAGTGAGTGAGTGAATGAGTGAATGGGAGT

I guess it's possible to grep '^[0-9]', but I don't know how to join such grep output with the fasta name.

My hypothesis is that tandem repeats has DNA motif like structure. How can I use repeat unit (like GAGT) and search for such motif occurrences genome wide?

At the moment my plan is:
- Every unit GAGT has it's sequence of occurrences GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG
- Submit GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG to MEME and get PSPM
- Scan PSPM genome wide
  
  I am using MEME as I don't know how can I scan for a given unit GAGT allowing mismatches.
  
  What should I do with redundant repeat units? For example: GAGT and GAGTGAGTGAGCCAGGAGT. Should I use the shortest unit like GAGT or all units even they overlap?

Thank you for you time.

• 7.6k views

ADD COMMENT • link updated 2.4 years ago by Adam Taranto ▴ 40 • written 12.2 years ago by Paul ▴ 20

score 1 · Answer 1 · 2023-02-25

1

Entering edit mode

2.4 years ago

Adam Taranto ▴ 40

I've written a little python tool called TRF2GFF that converts trf dat files into GFF3 format.

ADD COMMENT • link 2.4 years ago by Adam Taranto ▴ 40

score 0 · Answer 2 · 2013-05-13

0

Entering edit mode

12.2 years ago

Haluk ▴ 190

Parsing output of phobos is easier than this.

http://www.ruhr-uni-bochum.de/spezzoo/cm/cm_phobos.htm

ADD COMMENT • link 12.2 years ago by Haluk ▴ 190

score 0 · Answer 3 · 2015-06-01

0

Entering edit mode

10.1 years ago

basalganglia ▴ 40

How can you run tandem repeat finders tool ?

ADD COMMENT • link 10.1 years ago by basalganglia ▴ 40

score 0 · Answer 4 · 2015-08-31

0

Entering edit mode

9.9 years ago

Elke Schaper ▴ 110

We've implemented the open-source Python3 library TRAL - Tandem Repeat Annotation Library.

TRAL ships with parsers for multiple tandem repeat detection tools, including TRF. If you are not a fan of Python3, perhaps the code is still useful to solve the parsing problem.

ADD COMMENT • link 9.9 years ago by Elke Schaper ▴ 110