How to filter out the promoter-TSS peaks from the `.narrowPeak` file?
1
0
Entering edit mode
2.3 years ago
Dan ▴ 180

I annotated the peaks using annotatePeaks.pl macs2/${FILE}"_peaks.narrowPeak" mm10, The annotation file is :

PeakID  Chr Start   End Strand  Peak Score  Focus Ratio/Region Size Annotation  Detailed Annotation Distance to TSS Nearest PromoterID  Entrez ID   Nearest Unigene Nearest Refseq  Nearest Ensembl Gene Name   Gene Alias  Gene Description    Gene Type
Sample_A_peak_38484 chr6    47743726    47745106    +   5689    NA  promoter-TSS (NR_002841).2  promoter-TSS (NR_002841).2  -185    NR_002841   19799       NR_002841       Rn4.5s  -   4.5S RNA    rRNA

How can I select the promoter-TSS, promoter, orTSS peaks from the .narrowPeak file based on the Annotation column of the annotation file?

Thanks a lot.

awk • 1.1k views
ADD COMMENT
3
Entering edit mode
2.3 years ago
LChart 4.5k

You can use awk to do this in *nix systems: awk '$8 == "promoter-TSS"' x.narrowPeak

ADD COMMENT
0
Entering edit mode

The '$8 == "promoter-TSS"' is in the annotation file, but I want to filter the narrowPeak file, how can I do that? Thanks

ADD REPLY
2
Entering edit mode

If your narrowPeak file has peak IDs in the NAME field you can

awk '$8 == "promoter-TSS"' x.annot | cut -f1 > peak_names.txt'

grep -Wf peak_names.txt x.narrowPeak > x.promoterTSS.narrowPeak

If your narrowPeak file has only positions you will need to create loc strings:

awk '$8 == "promoter-TSS"' x.annot | awk '{print $2":"$3"-"$4}' > peak_locs.txt

awk '{print $0"\t"$1":"$2"-"$3}' x.narrowPeak | grep -Wf peak_locs.txt > x.promoterTSS.narrowPeak

This will have an extra column with the location string, but you can cut it out if you need.

ADD REPLY

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6