Gene length bias for ontology analysis.
1
1
Entering edit mode
8.6 years ago
michealsmith ▴ 800

I need to study the genomic distribution of certain transposon elements. So I first retrieve the information of the transposon element from repeatmasker in bed format (chr:start-end), then intersect with hg19 gene bed file. My purpose now is to figure out genes containing at least one such transposon would be enriched for certain categories or not, using GO term for example.

For instance:

GeneA: chr1:20000-50000
containing two transposonD: 
chr1: 25000-26000
chr1: 31000-32000

GeneB: chr3: 40000-80000
containing one transposonD:
chr3: 60000-62000

My question is should gene length bias be taken into account? One huge gene is naturally more likely to contain more transposon elements. Or GO term has already taken account of this?

I searched literature and found discussion about length bias for RNA-seq data, but not for my problem here. Thanks

gene ontology • 1.8k views
ADD COMMENT
4
Entering edit mode
8.6 years ago

To my knowledge GO terms are curated based on the function of genes (and the protein's cellular location). Gene length should not matter in your GO analysis.

However you do need to control for long genes (many are neuronally expressed) with transposable elements.

Sounds like a permutation is in order. If you randomly intersect the transposon library to the genome X number of times, do you expect to see the same enrichment of GO terms with the null?

ADD COMMENT

Login before adding your answer.

Traffic: 2779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6