Entering edit mode
7.3 years ago
caspase8mach
▴
30
Hi All,
I have read-depth data from various samples in the following format:
Chromosome, Start, End, Number of reads.
Different samples have data from different genomic regions, for example:
Sample 1
Chr 1: 10000 - 50000, 100 reads; Chr 2: 300-1300, 500 reads; Chr16: 8900 - 23000, 200 reads
Sample 2
Chr 1: 100000 - 200000, 10000 reads; Chr 5: 10000 - 50000, 400 reads, Chr 4: 70000 - 100000, 7000 reads;
so on and so forth.
In order to compare different samples, I would like to represent each sample's reads in (say) 1 MB bins. This is what I am trying to achieve:
Chr 1:1-1000000 Sample 1 - 100 reads, Sample 2 - 10000 reads, Sample 3 - 50 reads,
Chr 1:1000001-2000000 Sample 1 - 0 reads, Sample 2 - 5660 reads, Sample 3 - 8900 reads,
Chr 1:2000001-3000000 Sample 1 - 56900 reads, Sample 2 - 560 reads, Sample 3 - 8900 reads,
Chr 1:3000001-4000000 Sample 1 - 2344 reads, Sample 2 - 460 reads, Sample 3 - 900 reads,
Chr 1:4000001-5000000 Sample 1 - 0 reads, Sample 2 - 5660 reads, Sample 3 - 89 reads,
Is there a workflow / R package to achieve this?
I would like to retrieve the segmented (bin size 1 MB each) reads scores for different samples.
Thanks a lot for your help.
In the example I assume you forgot to include the number of reads for each region?
Made the change. Thanks