Just trying for simplest way to take a set of common gene names and generate a bed interval file of +/- 2kb of each gene TSS? Thanks.
Just trying for simplest way to take a set of common gene names and generate a bed interval file of +/- 2kb of each gene TSS? Thanks.
If you have file of mouse gene symbols called genes.txt
, here's one way you might get mouse gene 2kb proximal promoters for mm10
or GRCm38
:
$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M18/gencode.vM18.annotation.gff3.gz \
| gunzip --stdout - \
| awk '$3 == "gene"' - \
| convert2bed -i gff - \
| awk -vwindow=2000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), $2, $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, $3, ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
> gencode.vM18.promoters.bed
Then to filter them against the list of genes:
$ grep -w -F -i -f genes.txt gencode.vM18.promoters.bed > gencode.vM18.promoters.filtered.bed
Modify both start and pesudo-stop?
$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M18/gencode.vM18.annotation.gff3.gz \
| gunzip --stdout - \
| awk '$3 == "gene"' - \
| convert2bed -i gff - \
| awk -vwindow=2000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), ($2 + window), $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, ($3 - window), ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
| awk -vOFS="\t" '{ if ($2 < 0) { $2 = 0; } print $0; }' \
> gencode.vM18.promoters.bed
I added a test, to adjust the start coordinate to zero if it is less than zero. You might instead filter these elements out, if you require all elements to be 4kb windows centered on their TSSs.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Are answers here not extendable to current question: Table browser +/- 2Kb of TSS export You had asked this question back then.
Yes its helpful but I was looking for best conversion method of common gene names to RefSeq or UCSC etc outside of Table Browser which appears to be lacking in this respect.
That is a different question then. I would suggest taking a look at this file.