My objective is to identify transposons within my genome. To achieve this, I am pursuing a specific approach: aligning my genome to a reference and pinpointing regions that are distinct to my genome, subsequently annotating these regions. My initial attempt involved using the Matcher tool, yielding the following results.
Regarding my current approach, I have a couple of questions and considerations:
- Visualization of Contig Alignments:
I'm seeking a tool that can comprehensively align all the contigs from my genome and present the alignments in a unified view. This would greatly facilitate the analysis process, providing a more consolidated perspective of the alignments.
- Alternative Methods for Transposon Identification:
In addition to using Matcher, I'm interested in exploring alternative methods for transposon identification within my genome. While RepeatModeler is one option, I'm curious if there are other effective techniques or tools that could complement or enhance my analysis. I'm open to suggestions and insights from the community to optimize my transposon identification workflow. Thank you
########################################
# Program: matcher
# Rundate: Mon 11 Sep 2023 19:01:28
# Commandline: matcher
# -asequence /data/dnb09/galaxy_db/files/9/5/f/dataset_95f609af-7744-4fc7-99e9-916451e60a47.dat
# -bsequence /data/dnb09/galaxy_db/files/5/4/d/dataset_54d0ecd0-f68e-47f3-bb29-adf7f57e9e20.dat
# -outfile /data/jwd05e/main/062/442/62442138/outputs/dataset_8c4e8af1-256e-4cec-8288-9e7813504567.dat
# -alternatives 1
# -gapopen 16
# -gapextend 4
# -aformat3 markx0
# -auto
# Align_format: markx0
# Report_file: /data/jwd05e/main/062/442/62442138/outputs/dataset_8c4e8af1-256e-4cec-8288-9e7813504567.dat
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: NODE_1_length_1281197_cov_47.799695
# 2: chr_A
# Matrix: EDNAFULL
# Gap_penalty: 16
# Extend_penalty: 4
#
# Length: 795
# Identity: 494/795 (62.1%)
# Similarity: 494/795 (62.1%)
# Gaps: 42/795 ( 5.3%)
# Score: 1158
#
#
#=======================================
375760 375770 375780 375790 375800
NODE_1 AGCATCAAATGGTATGGTTCCTGTGGAGATAAGATGGGAGCAAGGTGGTG
::: :: :: :::::: :: :: :: :::: ::: :::::::::::
chr_A AGCTTCTAA---TATGGTCCCAGTCGATATAACCTGGCAGCAAGGTGGTA
377720 377730 377740 377750 377760
375810 375820 375830 375840 375850
NODE_1 AGAAGGTTTATGTGACAGGGTCCTTCACAAATTGGAGGAAAATGATAGGC
: :: :: ::::: :: :: :: :::::: ::::::::::::::::::
chr_A ACAAAGTATATGTCACTGGTTCTTTCACAGGTTGGAGGAAAATGATAGGT
377770 377780 377790 377800 377810
375860 375870 375880 375890 375900
NODE_1 TTAATTCCTGTTGAGTCCGAACCGGGCCATTTCAAGATTAAACTTCAGTT
::: : :: : :::: : : : ::: : :::::
chr_A TTAGTACCAATGCCTGATCAACCAAATGTACTGCACGTCAAATTACAGTT
377820 377830 377840 377850 377860
375910 375920 375930 375940 375950
NODE_1 GGCTCCTGGAACTCATAGATTTAGGTTTATCGTAGACAACCAGCTGAGGT
:::: :: :: :: :::::::::::::: :: :: :: :: ::::::
chr_A ACCTCCAGGTACGCACAGATTTAGGTTTATTGTCGATAATGAGTTGAGGT
377870 377880 377890 377900 377910
375960 375970 375980 375990 376000
NODE_1 TTAGTGATAACTTACCTACTGCAACTGATCAAATGGGTAATTTTGTTAAT
:::::::: : : :: :: :: :: :: ::::::::::: ::::: ::
chr_A TTAGTGATTTCCTTCCAACGGCTACAGACCAAATGGGTAACTTTGTCAAC
377920 377930 377940 377950 377960
376010 376020 376030 376040 376050
NODE_1 TATTTGGAGGTCTCGGCGGTTCCGAAGTCAGACTCCACGTCATCAAGAAC
::: ::::: : :: : ::: : : :: : : :::
chr_A TATCTGGAGATTGCGCCCGTTGC-------------AGGTACTGATGAAA
377970 377980 377990 378000
376060 376070 376080 376090 376100
NODE_1 AGGTAAGGAAAGGAAAGATAAAAATAAGAAATCTGTGAGTAAAGT-ATCG
: : : :: : :: : : :: ::: : :: ::
chr_A AACCACCTCCATTAACCCCACAAGTGTCAG----GTAAGTCAGGTGATGA
378010 378020 378030 378040
376110 376120 376130 376140 376150
NODE_1 AAGGATAG-GTCTACCGTGGGACCATTAAGTGCTAGGTCCTGTATAGCGT
::::: :: : ::: : :::::::: :: : :: :::
chr_A AAGGAAAGAGCCTA------------TGAGTGCTAGATCAAGGATTGCGC
378050 378060 378070 378080
376160 376170 376180 376190 376200
NODE_1 TAGAAATAGAAAAAGAGCCTGATGATTTTGGAGATGGGTACACCAGATAT
: :::::::::: ::: :: :::::::: :: ::::::: : : :
chr_A TTGAAATAGAAAGAGAACCAGATGATTTAGGTAATGGGTATAGTCGTTTC
378090 378100 378110 378120 378130
376210 376220 376230 376240 376250
NODE_1 CATGAAGAACTC-CCACAAGAACCAAAATACGAATTTAGTTCAGAGATCC
::::: : ::: :::: :: :::: :: :::: :: : :: :: :
chr_A CATGAT-ACCTCGCCACTGGAGACAAAGTATGAATATACTCAGGATATTC
378140 378150 378160 378170 378180
376260 376270 376280 376290 376300
NODE_1 CTGCTATATTTGTAGATCAGTCCATAATCGAGCAGT------TAACAATG
::::: : :: :::: : :: ::::::: : :: :
chr_A CTGCTGTCTTCACGGATCCTAATGTCATGGAGCAGTACTACCTGACTCTA
378190 378200 378210 378220 378230
376310 376320 376330 376340 376350
NODE_1 GAAAGGCAAAGAAAGAAATCCAATAATATGGCATGGTTGACACCGCCTCA
:: :::: :: :: : : :: ::::: ::: : :: :: :: ::
chr_A GATCAACAAAAGAACAACCACCAAAACATGGCCTGGCTAACTCCACCACA
378240 378250 378260 378270 378280
376360 376370 376380 376390 376400
NODE_1 GTTACCACCACAATTAGAAAACGTAATACTTAATAAATTCGGAGAGCCAT
:::::: ::::: : :: :: :: :: :: :::: : : : :
chr_A GTTACCCCCACATCTCGAGAATGTTATTCTGAATAGCTACTCTAATGCGC
378290 378300 378310 378320 378330
376410 376420 376430 376440 376450
NODE_1 TGAGTCAAAGCACGGAGAACAATGCAGGTGCGCTACCAATCCCTAATCAT
:: :: : :: :: :: ::::::: : :: :: :: ::::::
chr_A AAGGTGAATCTAACGAAAATAACTCAGGTGCTTTGCCTATACCAAATCAT
378340 378350 378360 378370 378380
376460 376470 376480 376490 376500
NODE_1 TCTGTGTTAAACCATCTGGTAACAACAAGCATTAAACACAACACACTCTG
: ::::: ::: : : :: : :: ::::: :::::::: :: ::
chr_A GTGATATTAAATCATTTAGCCACCAGTAGTATTAAGCACAACACTCTTTG
378390 378400 378410 378420 378430
376510 376520 376530 376540
NODE_1 TGTTGCAACAAACAACAGGTACAGGCAGAAGTACGTCTCACAGAT
:::::: : : ::::: : :: :: :: :::: ::
chr_A TGTTGCTTCGATTGTTAGGTATAAAAGAAAATATGTTACACAAAT
378440 378450 378460 378470 378480
#---------------------------------------
#---------------------------------------
Thanks Philipp, I ran my assembly through EDTA, it ended with annotation inconsistency.
I've had some success in re-evaluating class annotations from EDTA using TEsorter (https://github.com/zhangrengang/TEsorter), or is it a different issue? I'm not sure what inconsistency you mean here