Hi,
How to randomly select lines from a bed file? More specifically, I want to create a smaller bed file of genomic regions (chip-seq peaks) from a larger one, while maintaining the relative proportion of lines from each chromosome. For example if my input file has 1000 lines and want to select 100 lines randomly but maintaining the chromosome proportions relatively same.
It seems that this question was asked earlier here but I did not find the right solution? (How To Randomly Sample A Subset Of Lines From A Bed File)
Can you suggest me some tools or using awk or based on shell script.
Thank you, Naresh D J
In which way does the previous post not answer your question ?
The answers given in the previous post were based on choosing the fixed number of lines from each chromosome and not maintaining the relative proportions.
As I read the first answer there, it does what I understand you want: say you want 100 random lines from your bed file while preserving the proportion of each chromosome in these 100 lines, that's what I understand the solution provided does.