Hello everyone! I would appreciate some help with this data frame 'df':
df
Marker Sample allele
FlaHDF11 CP26 h
FlaHDF11 CP26 a
FlaHDF12 CP26 e
FlaHDF12 CP26 f
FlaHDF11 CP27 g
FlaHDF11 CP27 h
FlaHDF12 CP27 t
FlaHDF12 CP27 z
I would like something tlike this:
FlaHDF11 FlaHDF11 FlaHDF12 FlaHDF12
CP26 h a e f
CP27 g h t z
#This is the code Im using and I suspect the problem is in the line (6). Im always expecting to have just two rows for the same sample, but the code is optimized to add extra columns in case some samples have more than 2 alleles.
a = read.table('df', stringsAsFactors=F,header=T)
a = a[-1,]
a1 = a[!duplicated(a[,1:2]),]
rownames(a1) = paste(a1$Marker,a1$Sample)
t=1;
while(sum(duplicated(a[,1:2])) >0) #### I think the problem is here
{
a1 = cbind(a1,rep(NA, dim(a1)[[1]]))
a = a[duplicated(a[,1:2]),]
a2 = a[!duplicated(a[,1:2]),]
a1[paste(a2$Marker,a2$Sample),t+3] = a2$Size
t = t+1
}
colnames(a1) = c(colnames(a), paste('Size',1:(t-1),sep='.') )
a2 = a1[!is.na(a1$Size.2),]
m = matrix(NA,length(unique(a1$Sample)), length(unique(a1$Marker))*t)
rownames(m) = unique(a1$Sample)
colnames(m) = paste(rep(unique(a1$Marker), each=t), rep(1:4,length(unique(a1$Marker))), sep='.')
for (i in rownames(m))
{
tt = a1[a1$Sample %in% i,]
m[ i, paste(rep(tt$Marker,each=4), rep(1:4, dim(tt)[[1]]), sep='.')] = as.vector(t(tt[,3:6]))
}
m = m[,colSumsis.na(m)) < dim(m)[[1]]]
#### Error in [<-.data.frame
(*tmp*
, paste(a2$Marker, a2$Sample), t + 3, :
replacement has length zero
Kind regards! Roberto
Just in case this is a "XY problem" (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) it could be worth contextualizing why you want to reformat your data this way
It is also unclear why you have multiple columns named the same thing in your "desired output". that complicates any potential code solution as it doesn't really make sense why the same thing would have two different columns
It has sense...the columns are loci...each one with two alleles by sample ... Thanks
See my answer that tries to explicitly codify this assumption