just the gene names are included and the original data in "a" is not included.

Question

Nested data frames in a list

0

Entering edit mode

6.1 years ago

friasoler ▴ 50

Dear colleagues I'm trying to create a list of data frames from an original matrix. Each new data frame should have a new column with a row-sum that will be used to subset the new data frames accordingly. The point is I'm unable to get the desired data frame structure with multiple columns, I just get one vector-like files. I will appreciate your help, thanks in advance. Roberto

cm: matrix with columns B1,B2,B3,F1,F2,F3....S1,S2,S3 and rownames gene1...16432

 t=c("B","F","L","I","M","S")
    cm.f=list()
    for (i in 1:6)
    {
    a= data.frame("OoenG" = rownames(cm), cm[,grep(t[i], colnames(cm))]);  cm.f[t[i]] = data.frame(apply(a[,2:4],1,sum));
    cm.f[t[i]]=data.frame(a,cm.f[t[i]]);
    print(summary(cm.f[[i]] <3)) 
    }

Mode NA's logical 16432 Mode NA's logical 16432 Mode NA's .............

warnings() Warning messages: 1: In cm.f[t[i]] <- data.frame(a, cm.f[t[i]]) : number of items to replace is not a multiple of replacement length 2: In Ops.factor(cm.f[[i]], 3) : ‘<’ not meaningful for factors

just the gene names are included and the original data in "a" is not included.

R dataframe list • 1.6k views

ADD COMMENT • link updated 6.1 years ago by Fabio Marroni ★ 3.0k • written 6.1 years ago by friasoler ▴ 50

1

Entering edit mode

Can you provide a small, reproducible example?

However, it looks like you have 2 problems:

1) In a dataframe of X rows, you are trying to add a column with Y elements, where X<>Y

2) Your data.frame has values stored as factors, while you are probably using them as numeric.

ADD REPLY • link 6.1 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Thanks Fabio, here an example of my CM matrix:

B1 B2 B3 F1 F2 F3 I2 I3 I4 L1 L2 L3 M4 M5 M6 S1 S2 S3 
OOENG00001 4 3 11 6 5 27 2691 2665 3238 2960 1893 2329 0 0 1 1 10 10 
OOENG00002 12646 11932 17689 45610 55731 44342 451 474 437 1597 2077 4722 298 220 272 9769 11618 13528 
OOENG00003 14 10 16 248 316 260 354 292 366 2660 2804 2263 10 35 20 117 180 204
OOENG00004 135 107 129 155 117 95 540 608 542 153 114 119 85 93 114 256 185 189
OOENG00005 5794 5181 5989 1189 1427 643 1381 796 788 5753 8830 3004 117 359 249 1368 1772 1201 
OOENG00006 258 172 273 316 216 247 2509 3430 4259 400 249 311 150 183 258 432 458 522 
OOENG00007 3 5 2 25 27 11 1519 1595 2070 2786 2026 2298 0 0 1 758 866 707 
OOENG00008 166 225 234 412 432 345 1802 1452 2391 5473 3340 4074 307 580 431 288 453 444 
OOENG00009 77 55 40 517 359 353 141 92 243 38 42 26 630 1413 1024 340 282 255 
OOENG00010 2731 2360 2242 833 821 739 929 732 891 600 555 457 89 271 172 1151 1459 1385 
OOENG00011 3046 2016 4678 1310 1461 1132 517 591 513 155 149 181 68 135 102 377 501 781 
OOENG00012 183 148 180 450 525 369 950 995 1217 538 324 395 668 873 1023 548 516 452

ADD REPLY • link 6.1 years ago by friasoler ▴ 50

score 1 · Answer 1 · 2019-07-26

1

Entering edit mode

6.1 years ago

Fabio Marroni ★ 3.0k

Try this for creating the list of data.frames.

    t=c("B","F","L","I","M","S")
    cm.f=list()
    for (i in 1:6)
    {
    a= data.frame("OoenG" = rownames(cm), cm[,grep(t[i], colnames(cm))]);  
    a$mysum<-apply(a[,2:4],1,sum)
    #Remove rows i which the sum is smaller than 3
    a<-a[a$mysum>3,]    
    cm.f[[t[i]]] <- a   
    }

I am not sure about what is your aim with the command:

print(summary(cm.f[[i]] <3))

Can you explain?

EDIT: Added the line after the comment a<-a[a$mysum>3,]to remove all entries in which the sum is smaller than 3.

ADD COMMENT • link 6.1 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Many thanks! It work very well. At the end I would like to filter out all genes with less than 3-10 reads to conduct differential gene expression analysis. I'm aiming to rebuild 'cm' having "zeros" in the rows which total-sum<3 reads by tissue (B: brain...etc).

ADD REPLY • link 6.1 years ago by friasoler ▴ 50