Entering edit mode
23 months ago
bioinformatics.girl
▴
20
I am currently trying to produce a CoveragePlot() to locate a number of transcription factors; however instead of the input being a Chromatin Assay or Seurat Object, I wanted to pass output files passed by MACS2 which was run in the terminal.
I have:
- peaks.narrowPeak
- peaks.xls
- summits.bed
How would I do this? I don't know what the object would be equal to? Thanks.
The link to their vignette is here: https://stuartlab.org/signac/articles/peak_calling.html
Wouldn't you take your MACS2 files, import them with rtracklayer so that they are GRanges objects, and then pass those as the peaks? i.e. where ranges = peaks, the peaks are your GRanges of interest. The summits.bed will import directly (with import(summits.bed)), but the narrowPeak file requires a description of the extra columns in the bed file (see the docs for import).
What is 'peaks' entered as?...
The input is:
object =? region= ? (is it the transcription factor of interest) ranges = ? (how would I import this and from which file from the 3 I provided
The
object
will the the Seurat object you're working with, because that contains the data that will be plotted (Tn5 insertion events per position by cluster). Theregion
is the region in the genome you want to view. This can be specified as a set of genomic coordinates, a GRanges object, or a gene name if annotation exists within the Seurat object. The argument forranges
is also a GRanges object, and represents regions you would like to plot. In the example it is the regions that were called as peaks. If you're not sure what a GRanges object is, you should look it up. It's an amazing way to manipulate ranges on the genome, and carry around data and annotation with them - and basically any feature can be considered a range. It's basically an object that has BED-like information (chr, start,end,strand, etc.).So of the 3 files you have from MACS, all three represent the same set of features - peaks calls, just in slightly different formats. The narrowPeak file is a 6+4 BED file containing the peaks coordinates. The summits.bed file is a 5 column BED file containing just the peak summit - a 1 base wide feature representing the point of highest fragment pileup. The excel file is really just a text file that contains the same info as the narrowPeak file with p-values and q-values. You can import a BED file into R as a GRanges object using rtracklayer. Is you want to display the regions that are peak summits, you would grab those. After loading the GRanges and rtracklayer libraries:
Then when you call CoveragePlot, ranges=my_summits.
If you wanted to display the full length peak calls, you would import the narrowPeak file. But since it's a 6+4 BED file, you have to tell import what the extra columns represent:
I'm not sure what you mean by "locate a number of transcription factors", but if you wanted to view some region your transcription factor of interest is known to bind, an example would be:
This does not work. I called for the gene "Cd8a" for instance for the mouse genome and it says "Gene not found" even though it should appear.
Did you try putting in the coordinates of the gene? See ?CoveragePlot for help. (Should is half the reason this forum exists :)
Apologies for the back and forth. Is there a way to look for these coordinates? I placed region = "gene name" since I would not know the exact genomic location beforehand
Given the gene name, perhaps you're working with human or mouse? Either way, if you're working with a fairly popular organism, there will be databases in which you can look up the gene and find its coordinates (see ensembl, for instance). It should also be findable via the annotation underneath your data. And a library related to GenomicRanges is GenomicFeatures, which has several methods for retrieving and manipulating information about genes you might be interested in.