Question

Repeat regions were contained in cds file (extracted from evm.out.gff3 by using Perl script)

0

Entering edit mode

9.0 years ago

Ginsea Chen ▴ 140

Dear all

I predicted genes from a genome fragment by using EVM based on results of ab initio prediction, homologous sequences alignments and RNA-seq (trinity with or without genome-reference) database. Then I used an in-house Perl script of EVM to extract cds sequences from this fragment based on evm.out.gff3, while I found some repeat regions (which have been masked as NNNN) in some cds sequences. My question is how to treat these sequences ? delete whole sequence or this region ?

It is my first time in gene prediction, so I asked for help here. If anyone can give some suggestions, please help me.

Thanks all !

EVM CDS genome Repeats • 2.5k views

ADD COMMENT • link updated 8.8 years ago by abascalfederico ★ 1.2k • written 9.0 years ago by Ginsea Chen ▴ 140

Ram · Answer 1 · 2016-01-20

0

Entering edit mode

8.8 years ago

abascalfederico ★ 1.2k

Some CDS overlap repeats (e.g. Alu sequences). Do not delete anything. I guess you are getting NNNN because you are retrieving the sequence from a masked genome file. I would suggest to use an unmasked version of the genome.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by abascalfederico ★ 1.2k