Question

compare peaks in a gene list (coordinates)

0

Entering edit mode

8.2 years ago

Lila M ★ 1.3k

Hi everybody, I have a bed file with coordintes and the name of the transcript for each coordinate. I would like if there is any way for getting the peaks over this bed file. Can MACS2 do this?

Thanks!

ChIP-Seq peaks gene bed • 3.8k views

ADD COMMENT • link updated 8.2 years ago by ivivek_ngs ★ 5.2k • written 8.2 years ago by Lila M ★ 1.3k

0

Entering edit mode

I do not fully understand your question, can you refine it or give an example? Perhaps a simple "join" command will do what you want

ADD REPLY • link 8.2 years ago by Biojl ★ 1.7k

0

Entering edit mode

I have two files ( a and b), both are BED files (chr, start, end, strand) of different transcripts. I would like to compare them in therms of peaks, so I would like to know the number of peaks in file a and the number of peaks in file b to compare. So I don't know how I can get the number of peaks for each file (I've tried Homer and annotatePeak but the number of peaks are the same that the number of transcripts included in each file..)

ADD REPLY • link 8.2 years ago by Lila M ★ 1.3k

0

Entering edit mode

What do you mean by "peaks"?

peaks usually refers to areas of open chromatin as determined from data obtained from chip|mnase|dnase|atac-seq experiements --> in which case macs2 is an option

You say you have bed files, but from what experiment were these "transcripts" produced?

"transcripts" indicates RNA-seq --> in which case macs2 is not an option

ADD REPLY • link 8.2 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

cat "file a" | wc -l 
cat "file b" | wc -l

Those commands will give you the number of lines which in a bed file will be the number of peaks.

If you want to do more in depth comparisons I recommend you use bedtools or the pybedtools library for python.

ADD REPLY • link 8.2 years ago by Biojl ★ 1.7k

score 1 · Answer 1 · 2017-04-06

1

Entering edit mode

8.2 years ago

Sinji ★ 3.2k

Using bedops:

Sort all bed files first:

sort-bed $file_a > $output_a

sort-bed $file_b > $output_b

sort-bed $peaks > $output_peaks

Look for peak intersects in separate files:

bedops --element-of 1 $output_peaks $output_a > $output_peaks_a

bedops --element-of 1 $output_peaks $output_b > $output_peaks_b

You can then using something fairly simple like wc -l to count the number of lines corresponding to the number of peaks a given transcript (assuming file a and b only have a single transcript each).

ADD COMMENT • link 8.2 years ago by Sinji ★ 3.2k

0

Entering edit mode

You could even pipe and tee to get results and count them at the same time.

ADD REPLY • link 8.2 years ago by Alex Reynolds 36k

score 0 · Answer 2 · 2017-04-06

Just to be clear. The file a and file b are outputs of MACS2 or just random transcript files with chromosomal coordinates? You do not need MACS2 to find common and exclusive peaks. If your file a and file b are bed files and are results of MACS2 peak calling then they should be comparable with bedtools or bedops. Just sort them and then try to find the intersection at either 1bp or more and extract those region to another bed file. Once you annotate them you will be able to know the corresponding gene names but then again your annotations should be done either on promoters with a selection of windows size or regions excluding the promoters. Once you select only promoter regions from the intersection file lets say file c. You can annotate them to get the gene names. Any standard tool can do the needful. If you want to integrate the understanding of those genes at the level of expression for specific trasncripts you would need to have a gtf file that will have transcript id, chr coordinates and which will be either overlapped with your file c (promoter file) and then annotated or simply convert your RNA-Seq transcript id to gene names and overlap gene symbols of annotated file c to that of the genes that come from your RNA-Seq transcript file which is also having gene names.

To count the number of peaks its always a line count in unix so wc-l