I'm doing a ChipSeq analysis for the first time and have some basic questions. I successfully ran macs2 callpeak() and have a .narrowPeak file that I can load into IGV. I also have an .xls file with the names of specific genes we are interested in. I can load my .narrowPeak file into IGV, manually type in the gene name, and determine if my TF binds but the list is over 5000 genes long so doing this manually isn't an option. Does anyone know how I can do this via a batch file? Either with IGV or in R? I need output that lists each of the genes with a column of 0/1 to indicate if the gene bound somewhere in my .narrowPeak file.
Thanks in advance for any help, Stacy
Look at the GenomicRanges Bioconductor package. It has plenty of functions for overlaps, nearest/closest operations. Excel can be loded with openxlsx and narrowPesk is simply a text file, read.delim without header will do. The GenomicRanges manual covers how to create GRanges objects for overlap analysis.
Thanks...I'll take a look. The other thing I need to figure out is how to get the chromosome start/end positions. My .xls file has chromosome names but not positions...any ideas?
please show data