Processing Tandem Repeats Finder (Trf) Output For Downstream Motif Analysis
4
0
Entering edit mode
11.6 years ago
Paul ▴ 20

I have used Tandem Repeats Finder (TRF) for tandem repeat search in my fasta files.
Output looks like this:

Sequence: ENSG01

Parameters: 2 5 7 80 10 50 2000

1053 1139 4 22.2 4 67 2 62 28 4 48 18 1.68 GAGT GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG
1069 1137 20 3.6 19 74 7 71 28 5 47 17 1.70 GAGTGAGTGAGCCAGGAGT GAGTGAGTGAGCCAGTGAATGAGTGAGTG
1619 1746 8 16.8 8 65 9 60 27 1 52 18 1.55 GAGT GAGTGAGTGAGTGAATGAGTGAATGGGAGT

Sequence: ENSG02

Parameters: 2 5 7 80 10 50 2000

Example explanation:

Sequence (ENSG01) - fasta name
Column 14 (GAGT) - repeat unit
Column 15 (GAGAGAGTGGG) - repeat sequence

Help I need:

  • How to process such file:

    • Remove sequences that don't have repeats (like ENSG02) in them?
    • Combine fasta name with the following repeat data?

      For the output like this:

      ENSG01
      1053 1139 4 22.2 4 67 2 62 28 4 48 18 1.68 GAGT GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG
      1069 1137 20 3.6 19 74 7 71 28 5 47 17 1.70 GAGTGAGTGAGCCAGGAGT GAGTGAGTGAGCCAGTGAATGAGTGAGTG
      1619 1746 8 16.8 8 65 9 60 27 1 52 18 1.55 GAGT GAGTGAGTGAGTGAATGAGTGAATGGGAGT
      

      I guess it's possible to grep '^[0-9]', but I don't know how to join such grep output with the fasta name.

  • My hypothesis is that tandem repeats has DNA motif like structure. How can I use repeat unit (like GAGT) and search for such motif occurrences genome wide?

    At the moment my plan is:

    • Every unit GAGT has it's sequence of occurrences GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG
    • Submit GAGAGAGTGGGTTAGAGAGTGAGTGAGCCAGTGAATGAGTGAGTG to MEME and get PSPM
    • Scan PSPM genome wide

      I am using MEME as I don't know how can I scan for a given unit GAGT allowing mismatches.

      What should I do with redundant repeat units? For example: GAGT and GAGTGAGTGAGCCAGGAGT. Should I use the shortest unit like GAGT or all units even they overlap?

Thank you for you time.

• 6.9k views
ADD COMMENT
1
Entering edit mode
21 months ago
Adam Taranto ▴ 40

I've written a little python tool called TRF2GFF that converts trf dat files into GFF3 format.

ADD COMMENT
0
Entering edit mode
11.5 years ago
Haluk ▴ 190

Parsing output of phobos is easier than this.

http://www.ruhr-uni-bochum.de/spezzoo/cm/cm_phobos.htm

ADD COMMENT
0
Entering edit mode
9.5 years ago
basalganglia ▴ 40

How can you run tandem repeat finders tool ?

ADD COMMENT
0
Entering edit mode
9.2 years ago
Elke Schaper ▴ 110

We've implemented the open-source Python3 library TRAL - Tandem Repeat Annotation Library.

TRAL ships with parsers for multiple tandem repeat detection tools, including TRF. If you are not a fan of Python3, perhaps the code is still useful to solve the parsing problem.

ADD COMMENT

Login before adding your answer.

Traffic: 1998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6