How Can I Extract All Bidirectional Promoters In The Human Genome From Ucsc Genome Browser?
1
5
Entering edit mode
13.4 years ago
Farhat ★ 2.9k

What I would like is a bed file of all regions which have two genes on opposite strands and the TSSs are less than 1000bp from each other. I can do this using the entire gene table and some Python coding but I wonder if there is a way to do this using just an SQL query.

genome ucsc • 4.8k views
ADD COMMENT
12
Entering edit mode
13.4 years ago

Using the UCSC mysql server:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19

mysql> select K1.chrom,K1.name,K2.name,K1.strand,K2.strand,
  LEAST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as L,
  GREATEST(K1.txStart,K1.txEnd,K2.txStart,K2.txEnd) as R
  from
     knownGene as K1,
     knownGene as K2
  where K1.chrom=K2.chrom and
   ( (K1.strand='+' and K2.strand='-'  and ABS(K1.txStart-K2.txEnd) < 1000) or
     (K1.strand='-' and K2.strand='+'  and ABS(K1.txEnd-K2.txStart) <1000) )
 ;

+-------+------------+------------+--------+--------+---------+---------+
| chrom | name       | name       | strand | strand | L       | R       |
+-------+------------+------------+--------+--------+---------+---------+
| chr1  | uc009vjn.1 | uc010nxx.1 | +      | -      |  761586 |  788902 | 
| chr1  | uc001abp.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc001abq.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc009vjo.1 | uc010nxx.1 | +      | -      |  761586 |  788997 | 
| chr1  | uc001abr.1 | uc010nxx.1 | +      | -      |  761586 |  789740 | 
| chr1  | uc001acz.1 | uc001acx.1 | +      | -      | 1108435 | 1121241 | 
| chr1  | uc001adk.2 | uc001adh.3 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc001adi.3 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc009vjv.2 | +      | -      | 1152288 | 1170418 | 
| chr1  | uc001adk.2 | uc009vjw.2 | +      | -      | 1152288 | 1170418 | 
(...)

Edit: I fixed a problem with my previous answer. In the ucsc, the transcription start index is always on the 5' side (whatever the value of 'strand'). So , you have to take in account if your gene is on the strand '+' or '-' .

ADD COMMENT
2
Entering edit mode

Beauty! (just like that)

ADD REPLY
0
Entering edit mode

Wow! that was quick.

ADD REPLY
0
Entering edit mode

there is a problem with that query. Give me 5'...

ADD REPLY
0
Entering edit mode

Fixed. See my comment.

ADD REPLY
0
Entering edit mode

Thanks for the edit. It is clear now.

ADD REPLY

Login before adding your answer.

Traffic: 2500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6