Hi,
I am trying to calculate the cut sites for my ATAC-seq data and then use these for featureCounts. However, I am unclear exactly how to write the cut sites in BED file format. I have made two different BED files.
In the first, I modified the start and stop position for all forward (+) reads to be the original start site, and I modified the start and stop of all the reverse (-) reads to be the original stop site - 1. The reason for this is because from my reading, the stop site in a BED file is not inclusive.
BED file 1 e.g.
chr1 10073 10073 NB502048:39:HT3HMBGX3:1:13311:12074:5101/1 30 +
chr1 10073 10073 NB502048:39:HT3HMBGX3:4:21410:17550:14082/2 30 +
chr1 10162 10162 NB502048:39:HT3HMBGX3:2:23103:14543:10456/2 30 +
chr1 10208 10208 NB502048:39:HT3HMBGX3:2:23103:14543:10456/1 30 -
In the second, I modified the start position for all forward reads to be the original start site and the stop position to be start site + 1. For the reverse reads, I modified the start to be the original stop site - 1, and the end to be the original stop site. So in the first BED file, the start and end are the same, whereas in the second, the end is 1 bp more than the start.
BED file 2 e.g.
chr1 10073 10074 NB502048:39:HT3HMBGX3:1:13311:12074:5101/1 30 +
chr1 10073 10074 NB502048:39:HT3HMBGX3:4:21410:17550:14082/2 30 +
chr1 10162 10163 NB502048:39:HT3HMBGX3:2:23103:14543:10456/2 30 +
chr1 10208 10209 NB502048:39:HT3HMBGX3:2:23103:14543:10456/1 30 -
After creating these cut-site BED files, I converted them to BAMs and used these for featureCounts. With the two files, I get slightly different counts. I was expecting them to be the same if the stop site is non-inclusive in BED files.
Could anyone explain which of these approaches (if any) is correct and why I see different results with featureCounts using the two files?
Many thanks for the help,
Lucy