Question

Repeat Masker for fungal

0

Entering edit mode

5.7 years ago

shivangb3 • 0

Hello! I have a repeat masked file generated with repeats in lowercase using Repeat Masker. I am planning to use Blast2Go for gene prediction and annotation. Can I use the file with nucleotides in lower case (repeats) or do I neeed to convert the repeats from lowercase to NNNNs.

Kindly help. Thanks in advance.

Assembly • 1.6k views

ADD COMMENT • link updated 5.7 years ago by pltbiotech_tkarthi ▴ 180 • written 5.7 years ago by shivangb3 • 0

score 0 · Answer 1 · 2019-03-04

0

Entering edit mode

5.7 years ago

lieven.sterck 15k

I personally advice to use the sequence with lower case masking as this keeps some level of info (compared to hard masking), on the other hand I'm not sure if all gene prediction software understand this.

Moreover, Blast2GO is not a gene prediction tool , it's only used to assign GO labels to already predicted/annotated genes. You'l first will need to run a true gene prediction tool such as eg. Augustus, EuGene, GeneMark, ...

ADD COMMENT • link 5.7 years ago by lieven.sterck 15k

0

Entering edit mode

Blast2GO pro has an inbuilt function of gene prediction using Augustus.

ADD REPLY • link 5.7 years ago by shivangb3 • 0

0

Entering edit mode

Thank you so much for guidance.

ADD REPLY • link 5.7 years ago by shivangb3 • 0

score 0 · Answer 2 · 2019-03-04

You can also try https://www.girinst.org/censor/ for repeating masking of your sequence with available templates from the species of interest. If you have repeat, you need to find out, whether they are Terminal Inverted Repeat with Target Site Duplication, or Palindromes. If you have TIR with TSD, probably it is a signal for DNA transposon (autonomous or non-autonomous, if the element is small around 50-500bp it would MITE (Miniature Inverted Repeat Transposable Element)). You can try to annotate your sequence, first which kind of repeats you found. You can use tools like einverted repeat from EMBOSS. Then go for gene model after finding correct protein frame from ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/) and Splign (https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi?textpage=online&level=form) etc. Even you can use NCBI BLASTN for ortholog gene finding from available gene models, use discontiguous megablast for dissimilar sequences as well.