Hi everybody, I have a bed file with coordintes and the name of the transcript for each coordinate. I would like if there is any way for getting the peaks over this bed file. Can MACS2 do this?
Thanks!
Hi everybody, I have a bed file with coordintes and the name of the transcript for each coordinate. I would like if there is any way for getting the peaks over this bed file. Can MACS2 do this?
Thanks!
Using bedops:
Sort all bed files first:
sort-bed $file_a > $output_a
sort-bed $file_b > $output_b
sort-bed $peaks > $output_peaks
Look for peak intersects in separate files:
bedops --element-of 1 $output_peaks $output_a > $output_peaks_a
bedops --element-of 1 $output_peaks $output_b > $output_peaks_b
You can then using something fairly simple like wc -l
to count the number of lines corresponding to the number of peaks a given transcript (assuming file a and b only have a single transcript each).
Just to be clear. The file a
and file b
are outputs of MACS2 or just random transcript files with chromosomal coordinates? You do not need MACS2 to find common and exclusive peaks. If your file a and file b are bed files and are results of MACS2
peak calling then they should be comparable with bedtools
or bedops
. Just sort them and then try to find the intersection at either 1bp or more and extract those region to another bed file. Once you annotate them you will be able to know the corresponding gene names but then again your annotations should be done either on promoters with a selection of windows size or regions excluding the promoters. Once you select only promoter regions from the intersection file lets say file
c. You can annotate them to get the gene names. Any standard tool can do the needful. If you want to integrate the understanding of those genes at the level of expression for specific trasncripts you would need to have a gtf file that will have transcript id, chr coordinates and which will be either overlapped with your file c (promoter file) and then annotated or simply convert your RNA-Seq transcript id to gene names and overlap gene symbols of annotated file c to that of the genes that come from your RNA-Seq transcript file which is also having gene names.
To count the number of peaks its always a line count in unix so wc-l
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I do not fully understand your question, can you refine it or give an example? Perhaps a simple "join" command will do what you want
I have two files ( a and b), both are BED files (chr, start, end, strand) of different transcripts. I would like to compare them in therms of peaks, so I would like to know the number of peaks in file a and the number of peaks in file b to compare. So I don't know how I can get the number of peaks for each file (I've tried Homer and annotatePeak but the number of peaks are the same that the number of transcripts included in each file..)
What do you mean by "peaks"?
You say you have bed files, but from what experiment were these "transcripts" produced?
Those commands will give you the number of lines which in a bed file will be the number of peaks.
If you want to do more in depth comparisons I recommend you use bedtools or the pybedtools library for python.