(1) Why was the HSPA7 omitted from Affymetrix annotations? Is it due
to the biotype?
That would be a question for Affymetrix. However, the primary target for the probe is HSPA6; so, that is [I presume] why only that is listed. BiomaRt actually lists all of them:
require("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
getBM(
mart = mart,
attributes = c(
"affy_hg_u133_plus_2",
"ensembl_gene_id",
"gene_biotype",
"external_gene_name"),
filter = "affy_hg_u133_plus_2",
values = '117_at',
uniqueRows=TRUE)
affy_hg_u133_plus_2 ensembl_gene_id gene_biotype external_gene_name
1 117_at ENSG00000225217 unprocessed_pseudogene HSPA7
2 117_at ENSG00000173110 protein_coding HSPA6
3 117_at ENSG00000273112 lncRNA AL590385.2
4 117_at ENSG00000244682 polymorphic_pseudogene FCGR2C
5 117_at ENSG00000143226 protein_coding FCGR2A
If you look at the target region at the UCSC Genome Browser, you can begin to see what's happening:
So,:
- HSPA6 is target
- HSPA7 is included due to the fact that, as HSPA7 is a pseudogene,
the probe sequence may likely target it, too. However, as HSPA7 is an
unprocessed pseudogene, it can be inferred that it may not even be
expressed
- The FCGR2A and FCGR2C genes are included because there is a
'rogue' non-coding RNA, AL590385.2, that is transcribed across all of
these genes in this region
(2) I assume that it is better to work with the latest version of
Ensembl (please correct me, if my assumption is wrong ) rather than
with the Affymetrix annotations - that are of Ensebml 82. Should I
select the Affymetrix probes that correspond to specific gene biotypes
only?
Yes and No - these are design choices that you must make as the analyst. When annotating, you could code it such that the protein coding target, if present, is used in preference to other biotypes. Irrespective, in this case, HSPA6 is the target.
Kevin