Hi,
I would like to get the exon position of several non-human species, for this the best option would be to use the knownGene.txt file as mentioned by Pierre Lindenbaum. The problem is that this file is available for human and mouse but not for other species.
The other file that is present in non-human is the geneid.txt however I am a bit confused about its content, it doesn't seem to correspond to genes. Here an example with hg38 doing a head of knownGene.txt:
uc031tla.1 chr1 - 17368 17436 17368 17368 1 17368, 17436, ENST00000619216.1 uc057aty.1 chr1 + 29553 31097 29553 29553 3 29553,30563,30975, 30039,30667,31097, ENST00000473358.1 uc057atz.1 chr1 + 30266 31109 30266 30266 2 30266,30975, 30667,31109, ENST00000469289.1 uc031tlb.1 chr1 + 30365 30503 30365 30365 1 30365, 30503, ENST00000607096.1 uc001aak.4 chr1 - 34553 36081 34553 34553 3 34553,35276,35720, 35174,35481,36081, ENST00000417324.1
and of geneid.txt
585 chr1_1.1 chr1 - 16857 35736 16857 35736 7 16857,17232,17605,17914,18267,24737,35720, 17055,17257,17742,18061,18379,24891,35736, 0 chr1_1 incmpl cmp 0,2,0,0,2,1,0, 73 chr1_2.1 chr1 - 120816 195438 120816 195438 9 120816,129054,164765,185490,187375,187754,188129,188790,195262, 120932,129223,164791,185559,187577,187779,188266,188902,195438, 0 chr1_2 cmpl cmpl 1,0,1,1,0,2,0,2,0, 586 chr1_3.1 chr1 - 258540 258903 258540 258903 1 258540, 258903, 0 chr1_3 cmpl cmpl 0, 73 chr1_4.1 chr1 + 353849 393666 353849 393666 3 353849,368835,393552, 354030,370016,393666, 0 chr1_4 cmpl cmpl 0,1,0, 588 chr1_5.1 chr1 - 450739 485181 450739 485181 2 450739,485039, 451716,485181, 0 chr1_5 cmpl cmpl 1,0,
Does someone knows what the geneid.txt contains?
And the more important question to me, does someone knows how to retrieve the exon position of non-human species?
thanks
Columns five and six show the transcript start and end positions.
Columns seven and eight show the coding region start and end positions.
Column 9 shows the number of exons in transcript.
Column 10 shows the start positions for each of the column 9 number of exons.
Column 11 shows the end positions for each of the column 9 number of exons.
Unlike in BED format I don't believe that the positions are relative to the start of the feature, so I think you are good to go.
https://genome.ucsc.edu/cgi-bin/hgTables