How to filter genes by gene size
3
0
Entering edit mode
9.3 years ago
daviderix • 0

Hello,

Is there a way to get a list of all the genes in the human genome of a determined size?

I'm interested in narrowing down genes of the same length, in order to compare some statistics that I have based on some candidate genes. Do you guys know how to do this?

Any help would be greatly appreciated.

Thanks

Best,
d

genome gene • 2.6k views
ADD COMMENT
0
Entering edit mode

gene ? you mean cDNA ?

ADD REPLY
0
Entering edit mode

I'm interested in retrieving a list of gene IDs of a particular length (let's say 10kb)

ADD REPLY
0
Entering edit mode

for you gene = genomic sequence?

ADD REPLY
3
Entering edit mode
9.3 years ago

Still not clear if you want cdna, mrna, genomic...

~$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg38 -e 'select geneSymbol,chrom,txStart,txEnd,txEnd-txStart as L from knownGene as K, kgXref as X  where K.name=X.kgId and txEnd-txStart between 900 and 1100
' | head -n 20
geneSymbol    chrom    txStart    txEnd    L
OR4F5    chr1    69090    70008    918
RP11-34P13.9    chr1    160445    161525    1080
OR4F29    chr1    450739    451678    939
OR4F16    chr1    685715    686654    939
RP11-206L10.4    chr1    760910    761989    1079
NOC2L    chr1    958245    959256    1011
KLHL17    chr1    961448    962478    1030
HES4    chr1    998969    999981    1012
AGRN    chr1    1045398    1046349    951
RP11-54O7.14    chr1    1055032    1056116    1084
RP11-54O7.18    chr1    1062207    1063288    1081
RP11-465B22.8    chr1    1169356    1170343    987
FAM132A    chr1    1242445    1243463    1018
UBE2J2    chr1    1255263    1256335    1072
PUSL1    chr1    1308787    1309840    1053
CPSF3L    chr1    1314135    1315141    1006
MXRA8    chr1    1353214    1354247    1033
AURKAIP1    chr1    1374243    1375144    901
RP4-758J18.2    chr1    1399551    1400608    1057
ADD COMMENT
1
Entering edit mode
9.3 years ago
seidel 11k

Two easy ways to get this information would be:

  1. Ensembl biomart (http://www.ensembl.org/biomart/). Somewhat self-explanatory. Click around to select human and extract genes with coordinates.
  2. UCSC table browser: http://genome.ucsc.edu/cgi-bin/hgTables
ADD COMMENT
0
Entering edit mode

Thanks for your answer.

The problem with these two options is that, in order to be able to extract your genes of interest, you have to know their coordinates. Since I don't know them (my research is solely based on a particular gene length and I don't know ho many genes I'm gonna find), do you have any other suggestions?

Thank you so much

ADD REPLY
0
Entering edit mode

Thanks to Pierre and alolex for specific examples. You don't have to know any coordinates for either of these options. I figured you'd simply extract coordinates for all genes into a table and then filter it for what you need - but Pierre shows a nice example of selecting a size range up front. You haven't made clear what you mean by "gene size". A gene can have one length in terms of coordinates on the genome, and another length in terms of the transcript produced from that gene.

ADD REPLY
1
Entering edit mode
9.3 years ago
alolex ▴ 960

The information you seek just requires a little leg work. Go to the UCSC table browser (see link from seidel), select the group as "Genes and Gene Predictions" and the appropriate track for your project. Then select your region to be "genome", and select the output format to be "selected fields from primary and related tables". Click on "get output". You can then select you start and end coordinates and the "name2" field, which contains the gene name. Note you have transcription start and end and CDS start and end, so choose what is appropriate for you. You can also do individual exons if you want. Click on "get output" again to get a text file of results. Then you just need to calculate the difference between the start and end coordinates to get your length.

ADD COMMENT

Login before adding your answer.

Traffic: 1902 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6