Script For R or perl
5
0
Entering edit mode
8.0 years ago
alim.hcu ▴ 20

Dear Friends,

I want to bin my txt file into 100bp size based on position(column=2). And condition for bining is , there should be at least three cytosine position (column=7) in each 100 bp bin. Thank you

CcLG01  22  +   CHH 0.200   5.00    1   5   0   0   0.036   0.624
CcLG01  23  +   CHH 1.000   1.00    1   1   0   0   0.207   1.000
CcLG01  28  +   CHH 0.200   5.00    1   5   0   0   0.036   0.624
CcLG01  29  +   CHH 1.000   1.00    1   1   0   0   0.207   1.000
CcLG01  30  +   CHH 1.000   1.00    1   1   0   0   0.207   1.000
CcLG01  33  +   CHH 0.250   4.00    1   4   0   0   0.046   0.699
CcLG01  39  -   CHH 0.500   2.00    1   2   2   2   0.095   0.905
CcLG01  42  -   CHH 0.000   2.00    0   2   1   1   0.000   0.658
CcLG01  49  +   CHH 0.000   1.00    0   1   3   3   0.000   0.793
CcLG01  60  -   CHH 0.000   3.00    0   3   1   1   0.000   0.562
CcLG01  61  +   CHH 0.000   1.00    0   1   3   3   0.000   0.793
CcLG01  62  +   CHH 0.000   1.00    0   1   3   3   0.000   0.793
CcLG01  72  -   CHH 0.000   4.00    0   4   1   1   0.000   0.490
CcLG01  75  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  80  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  94  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  97  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  99  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  116 -   CHH 0.000   2.00    0   2   0   0   0.000   0.658
CcLG01  126 -   CHH 0.000   1.00    0   1   0   0   0.000   0.793
CcLG01  133 -   CHH 0.000   1.00    0   1   0   0   0.000   0.793
CcLG01  199 -   CHH 0.000   1.00    0   1   0   0   0.000   0.793
R next-gen • 2.9k views
ADD COMMENT
0
Entering edit mode

Sorry,you description is faint,maybe you can give an example. I think Perl is more suitable for your tesk.

ADD REPLY
0
Entering edit mode

Dear zjhzwang

Please find the attached example.

ADD REPLY
0
Entering edit mode

Fine,I know that,but what output format you want?

ADD REPLY
0
Entering edit mode

i want to add column 7 based on the column 2. But the addition should be based on a another file which have start and end position

like I have a another file2 (position information) in this format chr start end CcLG01 1 45 CcLg 58 135 So based on start and end position given in file 2. column 7 should be added in file1 (mentioned above)

ADD REPLY
0
Entering edit mode
8.0 years ago
jinych2bgi ▴ 20

i think you want to find the bin,that have at least 3 methylation cytosines in each 100bp bin? so perl exmaple:

while (<IN>) {

 chomp;

  my @tmp = split;

  $flag = int($tmp[1]/100)+1;

  if ($tmp[6] == 1) {

    push @db{$flag},"$tmp[1]";

  }

}

print "#flag\tstart\tend\tnumber of methylation\tmethylation positon\n"; 

foreach my $k (sort{$a<=>$b} keys %db) {

  my $s = ($k-1)*100+1;

  my $e = $k*100;

  my $num = scalar(@{$db{$k}});

  print "$k\t$s\t$e\t$num\t",join(",",@{db{$k}}),"\n";

}
<h6>#</h6>

you can filter the result by youself!!!!

ADD COMMENT
0
Entering edit mode

you might need to format your code using 4-spaces indent. learn more about markdown format.

ADD REPLY
0
Entering edit mode
8.0 years ago
zjhzwang ▴ 180

You can use grep() function to find the indexes of one chr.

data<-read.table("PATH",
                 header=F,
                 sep="\t",
                 )
chr<-as.character(levels(data[,1]))
bin<-function(chr){
  index<-grep(chr,data[,1])
  start<-data[index[1],2]
  end<-data[index[length(index)],2]
  position<-start-end
  cytosine<-sum(data[index,7])
  coverage<-sum(data[index,8])
  if(position>100 & cytosine>3){
    return(c(chr,start,end,cytosine,coverage))
  }
}
result<-t(apply(matrix(chr,ncol=1),1,bin))
colnames(result)<-c("chr","shart","end","cytosine","coverage")

The result is :

> result
     chr      shart end  cytosine coverage
[1,] "CcLG01" "22" "199" "7"      "56"

And you can output:

write.table(result,"PATH",col.names = T,row.names = F,sep="\t",quote=F)
ADD COMMENT
0
Entering edit mode

Dear zjhzwang,

Thanks for you prompt reply. Actually I have to add coulmn 7 based on position column(2). And addition of column 7 should be in 100 bp range. like CcLG01 114 - CG 0.000 2.00 0 2 0 0 0.000 0.658 CcLG01 136 - CG 0.000 1.00 0 1 0 0 0.000 0.793 CcLG01 243 - CG 0.000 1.00 0 1 0 0 0.000 0.793 CcLG01 1272 + CG 0.000 1.00 0 1 1 1 0.000 0.793 CcLG01 1273 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1277 + CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1278 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1281 + CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1282 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1287 + CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1288 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1296 + CG 1.000 1.00 1 1 0 0 0.207 1.000

like postion 114 ans 136 (it will be 100-200 base pair bin). But only two CG (Col=4) is there in between these position so it should be discarded cause minimum number of CG required to be qualified is = 3. from 1200 -1300 bin (100 bp) there are 9 CG so this will qualified And my output will be like that

Chr start end Cytosine(Sum of Col 7) coverage(sum of column 8) CcLG01 1200 1300 8 9

Thank You for your kind support

ADD REPLY
0
Entering edit mode

I have update the answer,but I'm not sure: if have [CcLG01 200 500 8 9] ,[CcLG01 200 300 3 6] and [CcLG01 400 500 5 3],should it be split?

ADD REPLY
0
Entering edit mode

Dear zjhzwang

Thank You for your time and prompt reply. It worked.

thank you again

ADD REPLY
0
Entering edit mode

Dear zjhzwang

When i used t(apply command .. i got following error.

result <- t(apply(matrix(chr,ncol=1),1,bin))

data=read.table("Sterile_05.txt", header=F, sep="\t")
+   index&lt;-grep(chr,data[,1])
+   start&lt;-data[index[1],2]
+   end&lt;-data[index[length(index)],2]
+   position&lt;-start-end
+   cytosine&lt;-sum(data[index,7])
+   coverage&lt;-sum(data[index,8])
+   if(position&gt;100 &amp; cytosine&gt;3){
+     return(c(chr,start,end,cytosine,coverage))
+   }
+ }

Error in t.default(apply(matrix(chr, ncol = 1), 1, bin)) : argument is not a matrix Error: unexpected ')' in "result<-apply(matrix(chr,ncol=1),1,bin))"

ADD REPLY
0
Entering edit mode

And when i used same data which you have used earlier . i got this error.

data=read.table("Sterile_05.txt", header=F, sep="\t")
+   index&lt;-grep(chr,data[,1])
+   start&lt;-data[index[1],2]
+   end&lt;-data[index[length(index)],2]
+   position&lt;-start-end
+   cytosine&lt;-sum(data[index,7])
+   coverage&lt;-sum(data[index,8])
+   if(position&gt;100 &amp; cytosine&gt;3){
+     return(c(chr,start,end,cytosine,coverage))
+   }
+ }
ADD REPLY
0
Entering edit mode
8.0 years ago
alim.hcu ▴ 20

Dear zjhzwang, when i used " result<-t(apply(matrix(chr,ncol=1),1,bin))" command..i got error.

data=read.table("Sterile_05.txt", header=F, sep="\t")
> chr<-as.character(levels(data[,1]))
> bin<-function(chr){
+   index<-grep(chr,data[,1])
+   start<-data[index[1],2]
+   end<-data[index[length(index)],2]
+   position<-start-end
+   cytosine<-sum(data[index,7])
+   coverage<-sum(data[index,8])
+   if(position>100 & cytosine>3){
+     return(c(chr,start,end,cytosine,coverage))
+   }
+ }

> result<-t(apply(matrix(chr,ncol=1),1,bin))
Error in t.default(apply(matrix(chr, ncol = 1), 1, bin)) : 
  argument is not a matrix
> result<-apply(matrix(chr,ncol=1),1,bin))
Error: unexpected ')' in "result<-apply(matrix(chr,ncol=1),1,bin))"
> result<-apply(matrix(chr,ncol=1),1,bin)
> result
NULL
ADD COMMENT
0
Entering edit mode

Sorry,I just see it.
Maybe you can check what chr is,It should be a vector,if not,maybe something is wrong.

ADD REPLY
0
Entering edit mode
7.9 years ago
alim.hcu ▴ 20

Dear zjhzwang

One more question i want to add column 7 based on the column 2. But the addition should be based on a another file which have start and end position

like I have a another file in this format chr start end CcLG01 1 45

So from position 1 to 45 in another file (previous, mentioned above) column 7 should be added.

Thank You

ADD COMMENT
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to reply to earlier answers, as such this thread remains logically structured and easy to follow.

ADD REPLY

Login before adding your answer.

Traffic: 2736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6