Hi guys, I would like to process a data frame in R
Chr start end strand transcript Length number_bp_overlap
chr1 879583 882140 - uc031pkq 2858 297
chr1 1571100 1647617 - uc001ags 76818 270
chr1 33117259 33151812 + uc010ohk 34854 200
chr1 33117259 33151812 + uc010ohk 34854 200
chr1 33117259 33151812 + uc010ohk 34854 211
chr1 39670723 39748740 + uc010oit 78318 386
What I want to do is to calculate the % of coverage for each transcript. So for each unique trasncript (e.g transcript uc010ohkm) what I need to do is to sum the number_bp_overlaps (200+200+211), and create a new data frame in which I could store the unique transcrpit with the total number_bp_overlap for each one.
what I am trying is
coverage <- ddply(df, "transcript", transform, coverage=sum(number_bp_overlap))
coverage <- subset(coverage, !duplicated(transcript))
but is not working at all As I am new in R, any clues about how can I do this quickly?
Thanks!
I think you could use the dplyr library
Yes, I've just edit my question, but the code is not working at all because it remove the duplicated.
!duplicated(transcript)
this line actually removes duplicationOook, and also the function sort the df alphabetically according with trasncript (I thought that I lost some date but is the way in which is sorted). Do you know any options to respect the initial order?
Thanks!