Question

Warning - longer object length is not a multiple of shorter object length

0

Entering edit mode

2.3 years ago

JACKY ▴ 160

I have a counts dataframe of RNA-seq dataset, and got the gene lengths using this code:

exons = exonsBy(EnsDb.Hsapiens.v86, by="gene")
exons = reduce(exons)
len = sum(width(exons))
INDEX = intersect(rownames(counts),names(len))
geneLengths = len[INDEX ]
counts = counts[INDEX ,]

And used the lengths to normalize the data from counts to TPM, using this code:

rpkm <- apply(X = subset(counts),
               MARGIN = 2,
               FUN = function(x) {
                 10^9 * x / geneLengths / sum(as.numeric(x))
               })

TPM <- apply(rpkm, 2, function(x) x / sum(as.numeric(x)) * 10^6) %>% as.data.frame()

I don't get errors, I do get the normalized data.. but I also get a list of warnings like this:

Warning messages:
1: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
2: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
3: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
4: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
5: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
6: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
7: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
8: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length
9: In 10^9 * x/genelength :
  longer object length is not a multiple of shorter object length

And it goes on for about 50 warnings...

Should I pay attention to these warnings? do they affect the normalization process?

genelength r TPM RNA-seq • 2.3k views

ADD COMMENT • link updated 2.3 years ago by Gordon Smyth ★ 7.6k • written 2.3 years ago by JACKY ▴ 160

0

Entering edit mode

I think you should be very concerned. It is hard to see from the code fragments what goes wrong, but I would double check that all objects involved have the desired length or dimensions which is likely not the case. Simple example:

> 10^9 * 1:3 / 1:15
 [1] 1000000000 1000000000 1000000000  250000000  400000000  500000000
 [7]  142857143  250000000  333333333  100000000  181818182  250000000
[13]   76923077  142857143  200000000
> 10^9 * 1:3 / 1:10
 [1] 1000000000 1000000000 1000000000  250000000  400000000  500000000
 [7]  142857143  250000000  333333333  100000000
Warning message:
In 10^9 * 1:3/1:10 :
  longer object length is not a multiple of shorter object length

So, x and genelength are not of the same size. Further, why do you use the subset function without further arguments? See ?subset

Also, I suggest using a proven Bioconductor package to do the job.

ADD REPLY • link 2.3 years ago by Michael 55k

0

Entering edit mode

A proven Bioconductor package you mean using a package for normalization?

If so, is there any package you recommend ?

ADD REPLY • link 2.3 years ago by JACKY ▴ 160

0

Entering edit mode

for example edgeR see https://support.bioconductor.org/p/64585/

Refer to the problem with rpkm (and tpm) , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7373998/ and other sources for a discussion of the validity of rpkm/tpm for cross sample comparisons.

ADD REPLY • link 2.3 years ago by Michael 55k

0

Entering edit mode

Thanks!

Do you also think my genelength calculation method is slumpy?

What do you think about this approach?

library (EDASeq)
ensembl_list <-  rownames(counts)
getGeneLengthAndGCContent(ensembl_list, "hsa")

ADD REPLY • link 2.3 years ago by JACKY ▴ 160

0

Entering edit mode

Reposted (with additions) to Bioconductor https://support.bioconductor.org/p/9146002

ADD REPLY • link 2.3 years ago by Gordon Smyth ★ 7.6k

0

Entering edit mode

Hi Gordon, do you really mean to link to this post? Looks very different

ADD REPLY • link 2.3 years ago by Michael 55k

0

Entering edit mode

Sorry, wrong link, now corrected.

ADD REPLY • link 2.3 years ago by Gordon Smyth ★ 7.6k