Does anyone have tab delimited files or csv files of transcription start sites labeled by chromosome and with the gene names on each chromosome. Writing a program and found resources online of promoter database but a file would be much more helpful.
Does anyone have tab delimited files or csv files of transcription start sites labeled by chromosome and with the gene names on each chromosome. Writing a program and found resources online of promoter database but a file would be much more helpful.
My Biostars answer Is There An Easy Way Of Getting Gene Symbols From Genomic Coordinates? gives txStart
and txEnd
values for UCSC genes for hg18
and the specified range. You could modify this query for your build/organism of interest (e.g., hg19
), your range of interest, or to look for TSSs for RefSeq or other gene tables.
Instead of running the command and looking at the results on standard output as my example showed, just redirect the output to a file:
$ mysql --user=genome ... | sort-bed - > result.bed
The file result.bed
is a BED-formatted, tab-delimited text file, because I put the fields into that order in my query. Coordinates may not be guaranteed to be sorted, so I pass it through BEDOPS sort-bed
to be sure.
A BED file is a tab-delimited text file that is described in more detail on UCSC's web site.
That information should be available in a UCSC Table:
http://genome.ucsc.edu/cgi-bin/hgTables?command=start
For example, I think the bed files for RefSeq, etc including coding coordinates as well as transcription start and stop sites.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I'm not familiar with the UCSC system, would it be possible to post the specific table I would look in and command for finding TSSs? Thanks very much.
It is in the answer I linked to (take a look at variables like
kg.txStart
etc. to see what table those refer to). Start there.