TSS file for D.melanogaster
3
0
Entering edit mode
10.4 years ago
catherine ▴ 250

I have chip-seq data, and I want to exclude the regions near TSS. Can anyone tell me how to get TSS file? I went to UCSC but didn't find it.

Thanks a lot for any advise in advance.

tss drosophila • 4.6k views
ADD COMMENT
2
Entering edit mode
10.4 years ago
Ian 6.1k

Within UCSC you can get the data you want.

First make sure you are currently viewing the right genome, e.g. DM3.

Select 'Tools' (along the top of the screen) > 'Table Browser' to access the tables of data used by UCSC.

Choose: 'group' = 'Genes and Gene Predictions', 'track' (depending on you preference) = 'RefSeq Genes' or 'FlyBase Genes'.

If you select 'output format' = 'BED' when you press 'get output' you will be given the option to 'Create one BED record per' > 'Upstream by N bases'

The resulting output file (to screen if you did not give a file name in the previous screen) will contain the coordinates of the promoter region for your analysis. Bear in mind that the coordinates are for transcripts (i.e. more than one transcript per gene).

Hope this helps.

ADD COMMENT
0
Entering edit mode

Thanks a lot, I didn't look up the table browser at the first place! Your answer is very specific and helpful!Thank you!

ADD REPLY
1
Entering edit mode
8.4 years ago

For anyone that might still find this, the proposed solutions to use an SQL query at UCSC will not give you an accurate number of TSS's. UCSC's annotated TSS data only has about 6100 TSS's, which is way less than the number of known TSS's. I haven't found a more complete solution but I'll update when I do.

ADD COMMENT
0
Entering edit mode
10.4 years ago

You can do a MySQL query of the UCSC Genome Browser, to output a sorted six-column BED file containing unique RefSeq records:

$ mysql -h genome-mysql.cse.ucsc.edu -u genome -D dm3 -N -A -e 'select chrom, txStart, txEnd, name2, score, strand from refGene' \
    | sort-bed - \
    | awk 'elements[$0]++ == 1' - \
    > refseq_tss.bed​

Once you have both the RefSeq TSSs and your ChIP-seq data in sorted BED format, you can use bedops --range --not-element-of on these two datasets to exclude any ChIP-seq peaks that fall in a window around each TSS.

See the following docs for more information on these and other bedops operations. Also, the table schema for Drosophila RefSeq is available here, so you can see where those field names come from and what they map to.

ADD COMMENT
0
Entering edit mode

Thank you and I got it!

ADD REPLY

Login before adding your answer.

Traffic: 2560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6