How to filter annotated genes overlapping with TE/repeat element?
1
0
Entering edit mode
6 weeks ago
Yao ▴ 30

Hello everyone, I am in the process of annotating my genome and am at the final step of filtering the annotated gene structures. I have noticed numerous papers referencing the final step of filtering genes based on varying coverage overlap with transposable elements (TE), such as 50%, 90%, and so on. I would like to perform a similar step but am uncertain about the procedure. Could anyone offer some guidance? Thank you!

genome gene annotation • 372 views
ADD COMMENT
0
Entering edit mode
6 weeks ago
alex.zaccaron ▴ 470

One option is to use BEDtools to find out how much your genes overlap with repetitive DNA. It would help if you have a GFF of repeats in the genome, similar to what RepeatMasker provides.

awk '$3=="gene"' genes.gff | bedtools coverage -a - -b repeats.gff > genes_repeat_cov.gff

Then select genes that overlap based on your threshold (e.g. 50%):

awk '$NF>0.5' genes_repeat_cov.gff

Then you can provide the IDs of the selected genes to agat_sp_filter_feature_from_kill_list.pl to remove them from the original GFF:

agat_sp_filter_feature_from_kill_list.pl --gff genes.gff --kill_list genes_to_remove.txt --type gene

ADD COMMENT
0
Entering edit mode

Wow very detail, Thanks! I will give it a try.

ADD REPLY

Login before adding your answer.

Traffic: 2586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6