I have chip-seq data, and I want to exclude the regions near TSS. Can anyone tell me how to get TSS file? I went to UCSC but didn't find it.
Thanks a lot for any advise in advance.
I have chip-seq data, and I want to exclude the regions near TSS. Can anyone tell me how to get TSS file? I went to UCSC but didn't find it.
Thanks a lot for any advise in advance.
Within UCSC you can get the data you want.
First make sure you are currently viewing the right genome, e.g. DM3.
Select 'Tools' (along the top of the screen) > 'Table Browser' to access the tables of data used by UCSC.
Choose: 'group' = 'Genes and Gene Predictions', 'track' (depending on you preference) = 'RefSeq Genes' or 'FlyBase Genes'.
If you select 'output format' = 'BED' when you press 'get output' you will be given the option to 'Create one BED record per' > 'Upstream by N bases'
The resulting output file (to screen if you did not give a file name in the previous screen) will contain the coordinates of the promoter region for your analysis. Bear in mind that the coordinates are for transcripts (i.e. more than one transcript per gene).
Hope this helps.
For anyone that might still find this, the proposed solutions to use an SQL query at UCSC will not give you an accurate number of TSS's. UCSC's annotated TSS data only has about 6100 TSS's, which is way less than the number of known TSS's. I haven't found a more complete solution but I'll update when I do.
You can do a MySQL query of the UCSC Genome Browser, to output a sorted six-column BED file containing unique RefSeq records:
$ mysql -h genome-mysql.cse.ucsc.edu -u genome -D dm3 -N -A -e 'select chrom, txStart, txEnd, name2, score, strand from refGene' \
| sort-bed - \
| awk 'elements[$0]++ == 1' - \
> refseq_tss.bed​
Once you have both the RefSeq TSSs and your ChIP-seq data in sorted BED format, you can use bedops --range --not-element-of
on these two datasets to exclude any ChIP-seq peaks that fall in a window around each TSS.
See the following docs for more information on these and other bedops
operations. Also, the table schema for Drosophila RefSeq is available here, so you can see where those field names come from and what they map to.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks a lot, I didn't look up the table browser at the first place! Your answer is very specific and helpful!Thank you!