I have afile like that :
chr pos reads
chr1 3004104 0
chr1 3005819 0
chr1 3008315 0
chr1 3008893 45
chr1 3009812 0
chr1 3012422 0
chr1 3015794 0
chr1 3016183 21
chr1 3024019 0
chr1 3025279 0
And i am trying to create a new .bed file where i will have start and end position and reads like this :
chr1 3008893 3009812 33
chr1 3016183 3024019 21
..... ............
The problem is that in R when i am using rbind i cant take chr1 etc as strings and it takes it as numbers so the final result is not with correct chromosomes. Here is my r code :
df[1:10,]
newdf <- data.frame(Date=as.Date(character()),
File=character(),
User=character(),
stringsAsFactors=FALSE)
for(i in 1:(nrow(df)-1)){
#print(dfs[i,2])
if(df[i,3]>0){
newdf<-rbind(newdf,c(df[i,1],df[i,2],df[i+1,2],df[i,3]))
#check the case that there are reads at the end of chr and next position is 0 new chromosome
if(df[i+1,1]!=df[i,1]){
newdf[length(newdf[,1]),3]= newdf[length(newdf[,1]),2]+1
}
}
}
I check a previous answer with data.frame but it doesnt work. Is any other way to have correct result ? Is it easier in python ?
Thank you in advance
Based on what criteria? Where is the interval information coming from?
If there is a number in reads then we create a fragment with start position the value of the pos where the read is and end position the next position. In case of existing reads at the last position of each chromosome we create a fragments with the pos as start and for end the pos +1. The algorithm works fine the problem is the chr1 chr2 chr3 etc order and the fact that this script transforms it to numbers. at the end instead of chr1, chr2 ...chrX, chrY i get 1,2......21,22 .