Hi,
I used data set from Encode consortium for my package development, due to size of actual peak files are rather big, I can't use these data set for my package use. Because actual size of package resulted from R CMD build must be less than 4Mb on disk, I have to use rather small peak file as an example data for my package . In Encode sample's data set, each peak files contains around 100,000 peaks each. How can I edit rather big bed files in order to keep particular chromosome ? Is there any handy tools to edit peak files ? Thanks in advance :)
Best regards :
Jurat
You could provide data for one chromosome. Choose the one important for your application.
@Goutham Atla: Thanks, peak files are already constructed in robust way and stored in bed file, I think there is no need to pick up important one, I think taking sample could be option. Should I take sample from each chrom ? How can I do that ? Could you elaborate your answer please ? I'm sorry if my question is simple to ask.
When you say "sample from each chromosome" ? Do you mean bam file ?
@Goutham Atla : I mean bed file, all peaks are stored in BED format file . Thanks
I think it would be better to pick just one chromosome rather than sampling peaks from the whole genome. If you sample from the whole genome you artificially increase the distance between peaks which may or may not be a concern.
By the way, a ChIP-Seq file of 100,000 peaks is quite extreme, most of them should be in the order of few thousands peaks (say 1000 to 30000). Are you sure you are looking at ChIP-Seq for transcription factors rather than FAIRE-Seq or nucleosomes?
@dariober : Yes, I am sure that I am looking at ChIP-Seq for TFBS. Thanks