I am doing a gene expression profiling on data obtained from whole blood (platform: Illumina HumanHT-12 V3.0 expression beadchip). Some genes in my dataset have "LOC**" names, and, as I understood, these are the genes of unknown function. Some of them have aliases, for instance:
Gene: LOC253012 | Aliases: MIKI
Could such LOC genes be non-coding? What is the best way to analyse and describe differentially expressed LOC, if there is not much available information about these genes? Or maybe it is better not to include them in the analysis? Any help is appreciated!
When a published symbol is not available, and orthologs have not yet
been determined, Gene will provide a symbol that is constructed as
'LOC' + the GeneID. This is not retained when a replacement symbol has
been identified, although queries by the LOC term are still supported.
In other words, a record with the symbol LOC12345 is equivalent to
GeneID = 12345. So if the symbol changes, the record can still be
retrieved on the web using LOC12345 as a query, or from any file using
GeneID = 12345.
Since the annotation on the arrays may not have been updated you should check to make sure that the gene still exists in latest genome build. Check to see if you are able to pull up a new name using the gene ID.
Example you used above seems to now have the name HEPACAM2. Using EntrezDirect you can find that out:
$ esearch -db gene -query LOC253012 | efetch -format ft
1. HEPACAM2
Official Symbol: HEPACAM2 and Name: HEPACAM family member 2 [Homo sapiens (human)]
Other Aliases: MIKI
Other Designations: HEPACAM family member 2; mitotic kinetics regulator
Chromosome: 7; Location: 7q21.2
Annotation: Chromosome 7 NC_000007.14 (93188534..93232293, complement)
MIM: 614133
ID: 253012