Question

Retrieve the segmented reads scores for different samples

0

Entering edit mode

7.9 years ago

caspase8mach ▴ 30

Hi All,

I have read-depth data from various samples in the following format:

Chromosome, Start, End, Number of reads.

Different samples have data from different genomic regions, for example:

Sample 1
Chr 1: 10000 - 50000, 100 reads; Chr 2: 300-1300, 500 reads; Chr16: 8900 - 23000, 200 reads

Sample 2
Chr 1: 100000 - 200000, 10000 reads; Chr 5: 10000 - 50000, 400 reads, Chr 4: 70000 - 100000, 7000 reads;

so on and so forth.

In order to compare different samples, I would like to represent each sample's reads in (say) 1 MB bins. This is what I am trying to achieve:

Chr 1:1-1000000 Sample 1 - 100 reads, Sample 2 - 10000 reads, Sample 3 - 50 reads, 
Chr 1:1000001-2000000 Sample 1 - 0 reads, Sample 2 - 5660 reads, Sample 3 - 8900 reads, 
Chr 1:2000001-3000000 Sample 1 - 56900 reads, Sample 2 - 560 reads, Sample 3 - 8900 reads,
Chr 1:3000001-4000000 Sample 1 - 2344 reads, Sample 2 - 460 reads, Sample 3 - 900 reads,
Chr 1:4000001-5000000 Sample 1 - 0 reads, Sample 2 - 5660 reads, Sample 3 - 89 reads,

Is there a workflow / R package to achieve this?

I would like to retrieve the segmented (bin size 1 MB each) reads scores for different samples.

Thanks a lot for your help.

NGS Bins Comparison • 1.8k views

ADD COMMENT • link updated 7.8 years ago by Biostar 20 • written 7.9 years ago by caspase8mach ▴ 30

0

Entering edit mode

In the example I assume you forgot to include the number of reads for each region?

Sample 1
Chr 1: 10000 - 50000 **100**, Chr 2:300-1300 **140**, Chr16: 8900 - 23000 **200**;