Database containing the importance of a DNA sequence
1
0
Entering edit mode
6.4 years ago
Gene_MMP8 ▴ 240

Hi,
I have around 1 million sequences of 20bp in length. These were the neighbourhood sequences of somatic mutations. Is there any database that can tell me the relative importance of these sequences within the human genome? By importance, I mean positional importance, i.e, whether it lies in a promoter region etc. Say I have a 20bp sequence and I want to know whether it falls within some important genomic region.

alignment sequencing • 895 views
ADD COMMENT
1
Entering edit mode
6.4 years ago

If you have the sequences in UCSC BED format, you can use BEDOPS convert2bed to convert gene annotations to BED, bedops to make a file of gene promoters (say, a region 500 nt upstream of the gene TSS), and bedmap to associate sequences with the promoters of genes.

For example, to get some gene annotations and write them to a BED-formatted file:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.basic.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "gene"' \
    | convert2bed -i gff - \
    > genes.bed

Or replace this step with whatever annotation source you prefer, for your reference genome.

Make a file of promoter regions from the genes:

$ awk -v OFS="\t" '($6 == "+") { print $1, $2, ($2+1), $4; }' genes.bed | bedops --range -500:0 --everything - > promoters.for.bed
$ awk -v OFS="\t" '($6 == "-") { print $1, ($3 - 1), $3, $4; }' genes.bed | bedops --range 0:500 --everything - > promoters.rev.bed
$ bedops --everything promoters.for.bed promoters.rev.bed > promoters.bed

Sort your BED-formatted sequences:

$ sort-bed sequences.unsorted.bed > sequences.bed

Map sequences to gene promoters:

$ bedmap --echo --echo-map --delim '\t' promoters.bed sequences.bed > answer.bed

Each line of the file answer.bed contains a promoter region, its associated gene ID, and any sequences that overlap the gene's promoter.

ADD COMMENT

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6