How to get gene lengths ?
0
0
Entering edit mode
2.8 years ago
JACKY ▴ 160

I have a raw RNA expression data frame with genes as rows (HUGO gene names) and samples as columns (homo sapiens research). I want to add another column that contains the length of each gene, and that is in order conduct TPM normalization (gene length is needed in the formula).

I'm familiar with one way to get the genes lengths which is by the goseq library:

length <- goseq::getlength(gene_names, 'hg19', 'geneSymbol')

Unfortunately this package does not support the latest hg38. Thus, many of the genes are not supported and have no lengths. I don't want to lose that much of information, from 20000 genes I get only 15000 lengths.

After a quick search I found another way using Biomart and EDASeq::getGeneLengthAndGCContent, however I dont understand how to use it and with what annotations.

I could really use some help with this function, or maybe some other way you guys might suggest.

Thanks!

tpm r RNA-seq • 1.7k views
ADD COMMENT
0
Entering edit mode

Past thread that may be useful:

gene length for calculating TPM values

ADD REPLY
0
Entering edit mode

How should I use the exonsBy function? I don't have a txdb object.. I tried using it with my gene names vector but I don't think that's the way, I'm missing something here.

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6