Extracting promotes from UCSC
0
0
Entering edit mode
3.0 years ago
peter ▴ 20

I downloaded promoter bed file of humans from UCSC. I basically just looked 1000bases upstream and downloaded the bed file. However, when I did an intersection of my bed file with clinvar variants, all lot of my positions in bed file were overlapping with variants in coding region and intronic region. Promoters are obviously in none of these. How can I ensure that the promoters file I have is only promoters and is not extending into intronic regions and coding regions. Insights will be appreciated.

Edit: Link to bed file: promoters bed file

So I got promoters region for Cage and UCSC. For CAGE promoters I also looked 1000 bases upstream or downstream based on the strand. So the code I have:

cat nonOverlapping_ucscPromoters.bed nonOverlapping_cagePromoters.bed > concat_ucsc_cage_promoters.bed

Next I sorted the file:

sort -k1,1 -k2,2n -k3,3n concat_ucsc_cage_promoters.bed > sorted_ucsc_cage_promoters.bed

Next I used bedTools merge to to join overlapping regions:

bedtools merge -i sorted_ucsc_cage_promoters.bed > nonOverlapping_ucsc_cage_promoters.bed

And this is the bed file you see in the link I've shared. And then in order to do intersect with clinvar file I do:

bedtools intersect -a clinvar.bed -b nonOverlapping_ucsc_cage_promoters.bed  -wa > clinVar_promoters.bed
promoters UCSC • 1.5k views
ADD COMMENT
0
Entering edit mode

You should include the file you downloaded and the code you used.

ADD REPLY
0
Entering edit mode

Do you mean clinvar file or the promoters bed file?

ADD REPLY
0
Entering edit mode

Just a link to the promoter file and adding your code to the post would be fine for now.

ADD REPLY
0
Entering edit mode

I've added a link to the file and the code

ADD REPLY
0
Entering edit mode

How can I ensure that the promoters file I have is only promoters and is not extending into intronic regions and coding regions.

For genes with alternate Transcription Start Sites (i.e. multiple isoforms, transcripts), isn't this problem unavoidable? One way would be to make sure that for any locus you are always choosing the 5' most TSS.

ADD REPLY
0
Entering edit mode

How can I do that? Is there a file I can intersect with?

ADD REPLY

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6