Question

How to filter out the promoter-TSS peaks from the `.narrowPeak` file?

0

Entering edit mode

3.0 years ago

Dan ▴ 180

I annotated the peaks using annotatePeaks.pl macs2/${FILE}"_peaks.narrowPeak" mm10, The annotation file is :

PeakID  Chr Start   End Strand  Peak Score  Focus Ratio/Region Size Annotation  Detailed Annotation Distance to TSS Nearest PromoterID  Entrez ID   Nearest Unigene Nearest Refseq  Nearest Ensembl Gene Name   Gene Alias  Gene Description    Gene Type
Sample_A_peak_38484 chr6    47743726    47745106    +   5689    NA  promoter-TSS (NR_002841).2  promoter-TSS (NR_002841).2  -185    NR_002841   19799       NR_002841       Rn4.5s  -   4.5S RNA    rRNA

How can I select the promoter-TSS, promoter, orTSS peaks from the .narrowPeak file based on the Annotation column of the annotation file?

Thanks a lot.

awk • 1.5k views

ADD COMMENT • link updated 3.0 years ago by LChart 5.0k • written 3.0 years ago by Dan ▴ 180

score 3 · Accepted Answer · 2022-08-08

3

Entering edit mode

3.0 years ago

LChart 5.0k

You can use awk to do this in *nix systems: awk '$8 == "promoter-TSS"' x.narrowPeak

ADD COMMENT • link 3.0 years ago by LChart 5.0k

0

Entering edit mode

The '$8 == "promoter-TSS"' is in the annotation file, but I want to filter the narrowPeak file, how can I do that? Thanks

ADD REPLY • link 3.0 years ago by Dan ▴ 180

2

Entering edit mode

If your narrowPeak file has peak IDs in the NAME field you can

awk '$8 == "promoter-TSS"' x.annot | cut -f1 > peak_names.txt'

grep -Wf peak_names.txt x.narrowPeak > x.promoterTSS.narrowPeak

If your narrowPeak file has only positions you will need to create loc strings:

awk '$8 == "promoter-TSS"' x.annot | awk '{print $2":"$3"-"$4}' > peak_locs.txt

awk '{print $0"\t"$1":"$2"-"$3}' x.narrowPeak | grep -Wf peak_locs.txt > x.promoterTSS.narrowPeak

This will have an extra column with the location string, but you can cut it out if you need.

ADD REPLY • link 3.0 years ago by LChart 5.0k