from mean per-base read coverage to RPKM using the Brawand data
0
0
Entering edit mode
9.9 years ago
Pei ▴ 220

Hi:

I am trying to calculating the RPKM from mean per-base read coverage which was provided by the paper Brawand et al, 2011.

but it seems strange to me that my resulting 'RPKM' value is rather small (each sample has a median value less than 1)
Below is my code[2].

rz2rpkm <- function(expr) {
    readcount <- expr # expr is a matrix of mean per-base read coverage with gene in rows and sample in columns
    readcount <- readcount[,6:ncol(readcount)]

    read.length <- 75
    exoniclength <- expr[,"ExonicLength"]
    for(j in colnames(expr)[6:ncol(expr)]) {
        readcount[,j] <- (expr[,j]*exoniclength)/read.length
    }

    M <- apply(readcount,2,sum)

    RPKM <- readcount
    for(j in colnames(readcount)) {
        RPKM[,j] <- (readcount[,j]/(exoniclength*M[j]))*10^9
    }

    return(RPKM)
}

Here I sincerely asking for your help for this problem.
I have no experience with RNA-seq data before.

Many many thanks!
Best,

mean per-base read coverage • 3.5k views
ADD COMMENT
0
Entering edit mode

Can you post a few example values that seem off and the exoniclengths used to derive them? RPKM values can be quite low.

BTW, you can get rid of the for loops and increase performance. The first loop is simply:

readcount <- readcount*exoniclength/read.length

The second one can be handled in a similar way.

ADD REPLY
0
Entering edit mode

Hello lin.pei26!

It appears that your post has been cross-posted to another site: SEQanswers.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Thank you Devon

Below are the top lines from MOUSE data[1]

I use this matrix as the variable expr

results looks like[2]

Link to the Brawand data: http://www.nature.com/nature/journal/v478/n7369/extref/nature10532-s5.zip

[1]

"GeneID"    "Chr"    "Begin"    "End"    "Strand"    "ExonicLength"    "Mouse_Brain_Female"    "Mouse_Brain_Male1"    "Mouse_Brain_Male2"    "Mouse_Cerebellum_Female"    "Mouse_Cerebellum_Male1"    "Mouse_Cerebellum_Male2"    "Mouse_Heart_Female"    "Mouse_Heart_Male1"    "Mouse_Heart_Male2"    "Mouse_Kidney_Female"    "Mouse_Kidney_Male1"    "Mouse_Kidney_Male2"    "Mouse_Liver_Female"    "Mouse_Liver_Male1"    "Mouse_Liver_Male2"    "Mouse_Testis_Male1"    "Mouse_Testis_Male2"
"ENSMUSG00000030105"    "6"    108733076    108773542    1    2730    72.2380952380952    185.524542124542    36.0362637362637    67.6091575091575    29.5714285714286    53.307326007326    31.956043956044    23.4047619047619    27.1358974358974    32.2355311355311    27.9516483516484    28.0758241758242    20.9168498168498    12.4223443223443    18.1003663003663    28.3615384615385    36.5879120879121
"ENSMUSG00000074445"    "3"    92054654    92093169    1    722    0    0    0    0.105263157894737    0    0    0    0    0    0.840720221606648    0.421052631578947    1.57756232686981    0    0    0    0    0
"ENSMUSG00000065586"    "6"    30119446    30119551    -1    106    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
"ENSMUSG00000065904"    "18"    35102298    35102367    -1    70    0.185714285714286    2.18571428571429    0    3.21428571428571    0.328571428571429    1.14285714285714    3.38571428571429    0.0285714285714286    0    0.757142857142857    0    0.7    1.22857142857143    0    0    0    0
"ENSMUSG00000020495"    "11"    86891234    86900276    -1    3218    4.8325046612803    15.2973896830329    2.7799875699192    6.60907395898073    1.98104412678682    4.95121193287756    4.10223741454319    2.73213175885643    3.43349906774394    3.46239900559354    3.96643878185208    4.55220633934121    4.47700435052828    2.64170292106899    4.36513362336855    4.3141702921069    6.85922933499068
"ENSMUSG00000085092"    "14"    69981561    70162426    -1    3368    31.2678147268409    36.4394299287411    9.57749406175772    36.7541567695962    17.750593824228    13.8067102137767    40.5237529691211    25.0736342042755    28.6576603325416    31.8931116389549    63.8717339667458    26.8788598574822    10.9421021377672    8.63984560570071    9.30255344418052    34.2105106888361    21.5483966745843
"ENSMUSG00000058979"    "6"    120459512    120481337    -1    1928    16.8656639004149    41.2095435684647    8.45331950207469    18.4590248962656    10.1841286307054    13.5679460580913    12.1696058091286    10.5238589211618    10.7536307053942    7.74636929460581    13.7214730290456    9.21317427385892    8.28112033195021    8.58869294605809    8.94709543568465    12.6483402489627    11.2422199170124
"ENSMUSG00000045903"    "19"    4984355    5038654    -1    3294    8.40255009107468    31.5170006071645    2.15725561627201    4.35033394049788    1.79204614450516    0.835458409228901    0.32301153612629    0.0230722525804493    0.206739526411658    0.0919854280510018    0.0230722525804493    0.0692167577413479    0    0    0    0.505464480874317    0.852459016393443
"ENSMUSG00000038045"    "17"    79284237    79306239    -1    1008    0    0.0843253968253968    0    0    0.0406746031746032    0.0753968253968254    0    0.0753968253968254    0.0753968253968254    0.150793650793651    0.0753968253968254    0.108134920634921    0.0694444444444444    0    0    0.461309523809524    0.189484126984127
"ENSMUSG00000049536"    "X"    133242521    133246017    1    2803    7.6578665715305    17.9450588655012    3.90082054941134    12.3389225829468    4.31109525508384    7.39778808419551    3.76489475561898    1.78808419550482    2.04566535854442    2.24687834463075    1.19158044951837    1.67998572957545    0.35176596503746    0.106671423474848    0.10845522654299    1.42133428469497    2.73135925793792
"ENSMUSG00000065644"    "2"    19856472    19856580    1    109    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
"ENSMUSG00000027333"    "2"    131317232    131351658    1    3585    13.8334728033473    36.1715481171548    10.1977684797768    37.7629009762901    12.0253835425384    21.0259414225941    13.6387726638773    5.08870292887029    6.85076708507671    2.4976290097629    1.321059972106    2.31938633193863    1.58940027894003    0.955927475592748    0.530543933054393    3.43291492329149    5.48647140864714

[2]

> RPKM2 <- rz2rpkm(expr2)
> RPKM2 <- as.matrix(RPKM2)
> head(RPKM2[,1:5])
                   Mouse_Brain_Female Mouse_Brain_Male1 Mouse_Brain_Male2 Mouse_Cerebellum_Female Mouse_Cerebellum_Male1
ENSMUSG00000030105         71.9478984        67.2613894         68.701174             53.98640240             53.5640773
ENSMUSG00000074445          0.0000000         0.0000000          0.000000              0.08405339              0.0000000
ENSMUSG00000065586          0.0000000         0.0000000          0.000000              0.00000000              0.0000000
ENSMUSG00000065904          0.1849682         0.7924244          0.000000              2.56663044              0.5951564
ENSMUSG00000020495          4.8130914         5.5460246          5.299895              5.27739347              3.5883556
ENSMUSG00000085092         31.1422048        13.2110106         18.258971             29.34846063             32.1524602
> apply(RPKM2,2,median)
     Mouse_Brain_Female       Mouse_Brain_Male1       Mouse_Brain_Male2 Mouse_Cerebellum_Female  Mouse_Cerebellum_Male1  Mouse_Cerebellum_Male2
             0.45542732              0.46554147              0.43339427              0.42292904              0.41544624              0.39474157
     Mouse_Heart_Female       Mouse_Heart_Male1       Mouse_Heart_Male2     Mouse_Kidney_Female      Mouse_Kidney_Male1      Mouse_Kidney_Male2
             0.17738306              0.11659419              0.17509136              0.30791267              0.26455986              0.32425262
     Mouse_Liver_Female       Mouse_Liver_Male1       Mouse_Liver_Male2      Mouse_Testis_Male1      Mouse_Testis_Male2
             0.08091724              0.09642508              0.08004370              0.62388267              0.57606359

> apply(expr2[,6:ncol(expr2)],2,median)
     Mouse_Brain_Female       Mouse_Brain_Male1       Mouse_Brain_Male2 Mouse_Cerebellum_Female  Mouse_Cerebellum_Male1  Mouse_Cerebellum_Male2
             0.45726426              1.28408540              0.22733105              0.52964960              0.22935780              0.34670551
     Mouse_Heart_Female       Mouse_Heart_Male1       Mouse_Heart_Male2     Mouse_Kidney_Female      Mouse_Kidney_Male1      Mouse_Kidney_Male2
             0.20987654              0.09183673              0.15756080              0.25945017              0.22200957              0.29439830
     Mouse_Liver_Female       Mouse_Liver_Male1       Mouse_Liver_Male2      Mouse_Testis_Male1      Mouse_Testis_Male2
             0.08786127              0.06836162              0.08113917              0.49350649              0.56578171
ADD REPLY
0
Entering edit mode

Those values look about right. Most genes aren't well expressed generally, which will skew the RPKM low.

ADD REPLY
0
Entering edit mode

Careful, I think your read counts are not raw counts, or why are they fractional numbers in [1]?

ADD REPLY
0
Entering edit mode

Those are mean per-base coverage (op is multiplying by length to get average read counts), not raw counts. Having said that, the raw data is available, so it'd be easy enough for Pei to just reprocess a single sample and check how correct these calculations are.

ADD REPLY

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6