Question

Starrseq data how can I characterise reads that maps out of the targeted regions.

0

Entering edit mode

6.6 years ago

morovatunc ▴ 560

Hi,

We conducted starrseq experiment that measures the actives of the given library. (Say enhancers). Then, we send this data to WGS.

Background:

Mapping: We have ~250million reads with 150 bp paired end data. We used bowtie with -v 3 -m 1 —best —strata -X 2000 parameters.

Then we analysed mapped data with deepTools.

In deeptools, we used multiBamSummary with a given Bed file. This bed file is actually our library that is consisted of ~8000 regions. (1000 pos control, 5000 neg control, 2000 tested regions). This step simply gives the number of the reads that overlap with our bed region. So for each region,I have the information of the number of overlapping mapped reads.

Problem:

Given that we have ~200 million mapped reads, only 60 million of them actually overlap with our targeted regions.

Question:

Disregarding the starrseq methodology, could you please help me out to find;

Location of rest of the (~140 million) mapped reads?
Why do we have huge amount of unspesific(?) mapping? or simply how would you solve such a problem ?

I know this is a specific question but your past experiences and comments could really help me.

Thank you very much,

T.

alignment sequencing mapping starrseq • 1.3k views

ADD COMMENT • link 6.6 years ago by morovatunc ▴ 560

1

Entering edit mode

Location of rest of the (~140 million) mapped reads?

You could create a subset BAM minus the regions you are interested in and then use something like Qualimap for a gross overview.

Why do we have huge amount of unspesific(?) mapping? or simply how would you solve such a problem ?

Some kind of experimental contamination (I don't know what STARR-seq is)?

ADD REPLY • link 6.6 years ago by GenoMax 147k

0

Entering edit mode

qualimap is highly appreciated. I have been using it for 3 weeks in multiple projects.

ADD REPLY • link 6.6 years ago by morovatunc ▴ 560

score 1 · Answer 1 · 2018-04-23

1

Entering edit mode

6.6 years ago

ATpoint 85k

Did you do Cap-STARRseq, so capturing of your target DNA? If so, it is not uncommon that you co-capture all kinds of other genomic regions, which then will be part of your library. Like 30% target-assigned reads sounds pretty ok to me. That will leave you with thousands of reads per target region. Should be more than enough for a STARRseq experiment. Do you have an elaborate statistical framework to make use of these high read counts/high power based on sequencing depth (hope you have replicates)?

ADD COMMENT • link 6.6 years ago by ATpoint 85k

0

Entering edit mode

Yeah. I think it is cap-starrseq because I remember the preparation of the array design, forward and reverse primers etc. (Sorry for my ignorance, I mainly focus on the computational part of this project.)

Could you help me to understand why you are not very concerned with %70 unspecific mappings?(my PI freaked out) Do you think like, those out-of-library regions were captured because they were in the proximity library regions? (like chipseq). My biggest concern is some kind of contamination.

We will have at least three replicates at the end of the line for sure.

Thank you for the discussion and your time.

ADD REPLY • link 6.6 years ago by morovatunc ▴ 560

0

Entering edit mode

You're welcome. I remember that when I checked the published CapSTARRseq data, you saw signal all over the place outside the target regions, probably due to offtarget binding. Why don't you simply download the available CapSTARRseq data (Vanhille 2015 for mouse, and Dao 2017 for human), and then check how the much offtargets they had. Maybe it is complete nonsense what I say, but I tend to remember that at least in the 2015 paper, you some a lot of signal coming up outside the on-target regions.

ADD REPLY • link 6.6 years ago by ATpoint 85k