Hey everyone, so I'm here because I'm completely at a loss.
I recently read this paper and was asked by my PI to reconstruct Figure 3. A using our ChIP-Seq and GRO-Seq data. Now while I'm primarily the only one in our lab that does any bioinformatic-y work, I honestly have no idea how something like this would be generated.
Essentially what my PI would like to do is computationally convert the genes in the Crick (reverse/negative) strand to the forward/+ strand so we can then find antisense transcripts to the genes in the sense strand (hopefully this makes sense).
I've read the supplemental methodology that the paper provided and am aware I must be able to identify genes that are protein coding and contain Pol II, and are within a region of 2.5k upstream of the TSS and 2.5k downstream of the pA site and longer than 2k.
What I really need is some guidance on different approaches on how to accomplish this. I know they did most of their heatmap creation with R using the ggplot2 and fields packages.
So far I've been able to identify the genes that follow their strict guidelines using a variety of awk and bedtools, but after that I'm not quite sure where to proceed.
I'm sorry if this question seems a little broad. Thank you.