Common gene names list to BED with +/- TSS intervals
1
0
Entering edit mode
6.3 years ago
rbronste ▴ 420

Just trying for simplest way to take a set of common gene names and generate a bed interval file of +/- 2kb of each gene TSS? Thanks.

RNA-Seq • 2.6k views
ADD COMMENT
1
Entering edit mode

Are answers here not extendable to current question: Table browser +/- 2Kb of TSS export You had asked this question back then.

ADD REPLY
0
Entering edit mode

Yes its helpful but I was looking for best conversion method of common gene names to RefSeq or UCSC etc outside of Table Browser which appears to be lacking in this respect.

ADD REPLY
0
Entering edit mode

conversion method of common gene names to RefSeq or UCSC

That is a different question then. I would suggest taking a look at this file.

ADD REPLY
3
Entering edit mode
6.3 years ago

If you have file of mouse gene symbols called genes.txt, here's one way you might get mouse gene 2kb proximal promoters for mm10 or GRCm38:

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M18/gencode.vM18.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "gene"' - \
    | convert2bed -i gff - \
    | awk -vwindow=2000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), $2, $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, $3, ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
    > gencode.vM18.promoters.bed

Then to filter them against the list of genes:

$ grep -w -F -i -f genes.txt gencode.vM18.promoters.bed > gencode.vM18.promoters.filtered.bed
ADD COMMENT
0
Entering edit mode

Very late reply, but this gets only upstream of the TSS by 2kb, wondering how I can get a window of 2kb on either side? Thanks!

ADD REPLY
1
Entering edit mode

Modify both start and pesudo-stop?

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M18/gencode.vM18.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "gene"' - \
    | convert2bed -i gff - \
    | awk -vwindow=2000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), ($2 + window), $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, ($3 - window), ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
    | awk -vOFS="\t" '{ if ($2 < 0) { $2 = 0; } print $0; }' \
    > gencode.vM18.promoters.bed

I added a test, to adjust the start coordinate to zero if it is less than zero. You might instead filter these elements out, if you require all elements to be 4kb windows centered on their TSSs.

ADD REPLY

Login before adding your answer.

Traffic: 2674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6