What Does Thickstart (Col 7) Or Thickend (Col 8) Mean In A Bed File?
2
0
Entering edit mode
11.5 years ago
Jordan ★ 1.3k

Hi,

I downloaded a list of refseq genes from the table browser - UCSC in bed format. From the bed format description given by UCSC, thickStart and thickEnd means:

thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays).
thickEnd - The ending position at which the feature is drawn thickly (for example, the stop codon in gene displays).

This has gotten me a bit confused. To explain my confusion look at the following sample bed file from UCSC.

chrI    7741935    8394405    NR_070240    0    +    8394405    8394405    0    8    18,12,13,9,11,11,8,18,    0,209004,270977,272247,461655,519425,544710,652452,
chrI    8378298    8390022    NM_001129046    0    -    8378298    8390022    0    8    123,103,110,116,65,69,124,113,    0,832,1401,2025,9723,9836,10481,11611,

So, what do columns 2 (start) and 3 (end) mean? And how are they different from columns 7 (thickStart) and 8 (thickEnd)? They seem be different in most of the cases! I thought col 2 and 3 mean meant the starting and ending positions of the genes. But the definition of thickStart and thickEnd has gotten me confused.

Here is the link to bed file description given by UCSC.

bed • 6.5k views
ADD COMMENT
2
Entering edit mode
11.5 years ago

things are clearer using mysql:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A   -e 'select * from ce10.refGene where name="NR_070240"\G'
*************************** 1. row ***************************
         bin: 1
        name: NR_070240
       chrom: chrI
      strand: +
     txStart: 7741935
       txEnd: 8394405
    cdsStart: 8394405
      cdsEnd: 8394405
   exonCount: 8
  exonStarts: 7741935,7950939,8012912,8014182,8203590,8261360,8286645,8394387,
    exonEnds: 7741953,7950951,8012925,8014191,8203601,8261371,8286653,8394405,
       score: 0
       name2: Y43F8B.27
cdsStartStat: unk
  cdsEndStat: unk
  exonFrames: -1,-1,-1,-1,-1,-1,-1,-1,
ADD COMMENT
0
Entering edit mode

It is surprising that none of the coordinates given from my example are present in output. The genome I have used ce10. Perhaps that's the reason?

ADD REPLY
0
Entering edit mode

opps, updated for ce10...

ADD REPLY
0
Entering edit mode

I didn't know about \G, thanks. You still have '-D hg19'. Also good to note that when cdsStart == cdsEnd, it is a non-coding gene.

ADD REPLY
2
Entering edit mode
11.5 years ago
Ido Tamir 5.2k

Thickstart and thickend are the left and the right boundaries of the coding sequence. Columns 2 and 3 are left and the right boundaries of the transcript. In the UCSC genome browser the CDS is displayed "thicker" than the UTRs.

ADD COMMENT

Login before adding your answer.

Traffic: 3086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6