Question

coverage for unique transcript and df process

0

Entering edit mode

7.5 years ago

Lila M ★ 1.3k

Hi guys, I would like to process a data frame in R

Chr start   end strand  transcript  Length  number_bp_overlap
chr1    879583  882140  -   uc031pkq    2858    297
chr1    1571100 1647617 -   uc001ags    76818   270
chr1    33117259    33151812    +   uc010ohk    34854   200
chr1    33117259    33151812    +   uc010ohk    34854   200
chr1    33117259    33151812    +   uc010ohk    34854   211
chr1    39670723    39748740    +   uc010oit    78318   386

What I want to do is to calculate the % of coverage for each transcript. So for each unique trasncript (e.g transcript uc010ohkm) what I need to do is to sum the number_bp_overlaps (200+200+211), and create a new data frame in which I could store the unique transcrpit with the total number_bp_overlap for each one.

what I am trying is

coverage <- ddply(df, "transcript", transform, coverage=sum(number_bp_overlap))
coverage <- subset(coverage, !duplicated(transcript))

but is not working at all As I am new in R, any clues about how can I do this quickly?

Thanks!

R data frame coverage • 1.7k views

ADD COMMENT • link updated 7.5 years ago by Rashedul Islam ▴ 480 • written 7.5 years ago by Lila M ★ 1.3k

2

Entering edit mode

I think you could use the dplyr library

ADD REPLY • link 7.5 years ago by Medhat 9.8k

0

Entering edit mode

Yes, I've just edit my question, but the code is not working at all because it remove the duplicated.

ADD REPLY • link 7.5 years ago by Lila M ★ 1.3k

0

Entering edit mode

!duplicated(transcript) this line actually removes duplication

ADD REPLY • link 7.5 years ago by Medhat 9.8k

0

Entering edit mode

Oook, and also the function sort the df alphabetically according with trasncript (I thought that I lost some date but is the way in which is sorted). Do you know any options to respect the initial order?

Thanks!

ADD REPLY • link 7.5 years ago by Lila M ★ 1.3k

score 0 · Answer 1 · 2017-05-18

0

Entering edit mode

7.5 years ago

Rashedul Islam ▴ 480

You might be looking for this, if not please add reproducible input and output. Thanks!

library(dplyr)

df = data.frame(x = c("a","b","b"), y = c(1:3)) df %>% group_by(x) %>% summarise(y = sum(y))

ADD COMMENT • link 7.5 years ago by Rashedul Islam ▴ 480