Hi,
We conducted starrseq experiment that measures the actives of the given library. (Say enhancers). Then, we send this data to WGS.
Background:
Mapping: We have ~250million reads with 150 bp paired end data. We used bowtie with -v 3 -m 1 —best —strata -X 2000 parameters.
Then we analysed mapped data with deepTools.
In deeptools, we used multiBamSummary with a given Bed file. This bed file is actually our library that is consisted of ~8000 regions. (1000 pos control, 5000 neg control, 2000 tested regions). This step simply gives the number of the reads that overlap with our bed region. So for each region,I have the information of the number of overlapping mapped reads.
Problem:
Given that we have ~200 million mapped reads, only 60 million of them actually overlap with our targeted regions.
Question:
Disregarding the starrseq methodology, could you please help me out to find;
- Location of rest of the (~140 million) mapped reads?
- Why do we have huge amount of unspesific(?) mapping? or simply how would you solve such a problem ?
I know this is a specific question but your past experiences and comments could really help me.
Thank you very much,
T.
You could create a subset BAM minus the regions you are interested in and then use something like Qualimap for a gross overview.
Some kind of experimental contamination (I don't know what STARR-seq is)?
qualimap is highly appreciated. I have been using it for 3 weeks in multiple projects.