I have a list named BRCA.b.1
seqnames start end width strand score.x name score.y annotation percentGC percentAT
1 chr1 1 9999 9999 * 0.000000 BRCA_1 1.59268 Distal 0.3852295 0.3592814
2 chr1 10000 10099 100 * 17.716522 BRCA_1 1.59268 Distal 0.3852295 0.3592814
3 chr1 10100 10199 100 * 30.601267 BRCA_1 1.59268 Distal 0.3852295 0.3592814
4 chr1 10200 10299 100 * 9.663558 BRCA_1 1.59268 Distal 0.3852295 0.3592814
5 chr1 10300 10399 100 * 4.831779 BRCA_1 1.59268 Distal 0.3852295 0.3592814
6 chr1 10400 10499 100 * 8.052965 <NA> NA <NA> NA NA
and I have another list with distinct values of column name called BRCA
for each specific name in the column name
I want to calculate the sum of scores for corresponding rows and store them in BRCA$coverage
. so for the first value in column name
that is BRCA_1
I would like to get 62.81313
so far I wrote this code and it works right
for ( i in 1: nrow(BRCA))
{
BRCA$coverage[i]<- sum(BRCA.b.1[which(BRCA.b.1$name== BRCA$name[i]), 6])
}
but because BRCA.b.1 is a big list it takes a long time to run
can you suggest a more efficient way to do this ?
with tsv-utils:
output: