I am having trouble understanding the annotation tag. For e.g. NM_173900_utr3_0_0_chr5_104123604_r is one of the annotations tag. The annotation file was downloaded as a bed file from UCSC table browser.
I understand that NM_173900 is the ncbi accession i.d of the gene. _utr3_ means it’s in utr3,
I also understand the later part chr5_104123604_r means it’s in chr5 and the given position. -r means it’s in reverse (-) strand.
What has been bothering me are the two zeros in the middle. I am not able to figure out what they mean.
I have put here multiple examples:
NM_001075941_utr3_3_0_chr27_1251478_f
NM_001193172_up_2000_chr7_57415651_f
NM_001192104_cds_13_0_chr2_44344566_f
Please help me understand the numbers in the middle.
Thank you, Suraj
what is the "ucsc annotation tag" ? how did you get those identifiers ?
I got the annotation files from UCSC table browser. The annotation file looks like this.
Annotation of what? What did you query?
I don't understand what information you want. Could you please explain?
Table browser is not just a tool where you click once and it magically gives you data. You have to choose what you want to download. Without knowing what you downloaded, we have no idea what it means. So what did you download?
Thank you for clarifying it. I downloaded annotation for 3' UTR exons from the UCSC table browser by making following selections.
Clade - Mammal; Genome - Cow; assembly - Apr. 2018, group - Genes and gene predictions; track - NCBI RefSeq region - genome; output format - BED; output file - 3' UTR
Then on the get output tab; I selected3' UTR exons
and pressedget BED
. This downloaded a file that I saved in my computer. Few lines of this file look like thischr1 1000624 1002224 NM_001034679_utr3_3_0_chr1_1000625_f 0 +
chr1 1046829 1047018 NM_001077977_utr3_2_0_chr1_1046830_f 0 +
chr1 1099124 1099325 NM_001077124_utr3_0_0_chr1_1099125_r 0 -
chr1 1102878 1103061 NM_001114516_utr3_0_0_chr1_1102879_r 0 -
I want to understand what the numbers in the middle of the annotation tag mean.
I think the first information indicates the region (
utr3_3
indicates 4th exon in utr3,up
indicates upstream,cds_13
indicates 14th exon and the exon is coding). The second information indicates the relative position of the indicated region .NM_001193172_up_2000_chr7_57415651_f
means chr7:57415651:+ is the location of 2000bp upstream of NM_001193172.NM_001192104_cds_13_0_chr2_44344566_f
means chr2:44344566:+ is the first position of the 14th exon in NM_001192104 and this exon is a coding exon.