Genomic Location Of Micro Array Probe Sequences
4
5
Entering edit mode
13.6 years ago
Georg Summer ▴ 140

I have a the individual probe sequences of a micro array. Now I would like to know where in the genome of the associated organism the individual probe sequences map. I am not interested in the genes in these areas but solely the location chrA bp x - y of matches. (I expect one probe to match different positions in the complete genome)

The Microarray Probe Mapping of Ensembl provides similar functionality but for know I would like to avoid the hassle of setting Ensembl up.

Can anyone point me to a data source that has this information?

microarray genome • 4.6k views
ADD COMMENT
0
Entering edit mode

What array technology are you using? I am asking because some resources indeed already map probes or probesets from specific technology. And the specific details (especially probe length) of you technology might influence what the optimal tool would be,.

ADD REPLY
0
Entering edit mode

initially it will be probe sequences of affy chips but might eventually evolve to sequences in general. that is why I do not want to rely to much on manufacturer supplied data (additional reasons see comment on the answer of Michael Dundrup)

ADD REPLY
0
Entering edit mode

If you want to generalize your pipeline to deal with any sequences then I'd recommend setting up blat.

ADD REPLY
2
Entering edit mode
13.6 years ago
Michael 55k

You can use a short-read aligner to map the probes, it depends a bit on the length of the probe. Have the genome sequence and the probe sequence in a fasta file, then you can use blat, or Lastz immediately. Also Mosaik, SHRiMP, or SSAHA2 could be used. There are also a lot of Array annotation packages in Bioconductor. If it is a custom array, you can also perform matching in R using the Biostrings package and if available for your organism, install the BSGenome package for it.

I suggest your array manufacturer provides you with technical support information that contain these mappings, but it is always good to check those, e.g. for probes hitting multiple locations.

ADD COMMENT
0
Entering edit mode

well i am actually interested into multiple hit locations. call it paranoia but i have a general distrust for micro arrays and the manufacturer supplied support material. up-to-date-ness is not always their strength, so i prefer manufacturer decoupled pipelines

ADD REPLY
0
Entering edit mode

The custom cdfs (cf my answer to this question) are not manufacturer produced. But Affymetrix is actually quite open about there software approaches and a lot of developmental libraries are available from them as open source.

ADD REPLY
0
Entering edit mode

The custom cdfs (cf my answer to this question) are not manufacturer produced. But Affymetrix is actually quite open about their software approaches and a lot of developmental libraries are available from them as open source.

ADD REPLY
2
Entering edit mode
13.6 years ago

For Affymetrix arrays like you are using now, your problems with probes with multiple hits should already be covered in the so called [?]custom cdf's[?]. That is in fact why they were created, see [?]this publication[?]. The custom probesets are newly selected combinations of individual probes that each are selected based on the fact that they hit the target uniquely.

Since you are in Maastricht you might want to know that we already have experience with running BLAT on complete sets of ENSEMBL gene sequences and selecting the unique hits. That is how we selected the probesets used on [?]the NuGO arrays[?]. Part of the procedure is production of a table that for every probe contains the information what it hits. So if you really want to use individual probes (there are quite some thermodyamic and statistical reasons why that is not necessarily a good idea) that table could be a good start.

ADD COMMENT
0
Entering edit mode

in my case or lets better say in the ideas i am toying around with i am not so much interested in the probesets and what the cover but the actual individual probes. i am coming at this problem from a quite different way than the "usual" micro array analysis.

ADD REPLY
0
Entering edit mode

I edited my answer to better address your specific interest in individual probes. But please be aware that individual probesequences an give highly variable signals because of for instance GC content and hairpins present. Statistical evaluation of probesets is key to Affymetrix analysis.

ADD REPLY
0
Entering edit mode

I edited my answer to better address your specific interest in individual probes. But please be aware that individual probe sequences can give highly variable signals because of for instance GC content and hairpins present. Statistical evaluation of probesets is key to Affymetrix analysis. – Chris Evelo 0 secs ago

ADD REPLY
0
Entering edit mode

I edited my answer to better address your specific interest in individual probes. But please be aware that individual probe sequences can give highly variable signals because of for instance GC content and hairpins present. Statistical evaluation of probesets is key to Affymetrix analysis.

ADD REPLY
2
Entering edit mode
13.6 years ago

The UCSC genome table browser has several Affy tables mapped to various organisms:

affyU133Plus2 in hg19 for example

these tables are psl formatted:

bin matches misMatches  repMatches  nCount  qNumInsert  qBaseInsert tNumInsert  tBaseInsert strand  qName   qSize   qStart  qEnd    tName   tSize   tStart  tEnd    blockCount  blockSizes  qStarts tStarts
585 530 4   0   23  3   41  3   898 -   225995_x_at 637 5   603 chr1    249250621   14361   15816   5   93,144,229,70,21,   34,132,278,541,611, 14361,14454,14599,14968,15795,
ADD COMMENT
1
Entering edit mode
13.6 years ago
Gareth Palidwor ★ 1.6k

I suspect the simplest way to achieve your goal is to use the Ensembl API. It's really not that hard to set up...

I recently did something similar where I wanted to restrict my analysis to MOE430 probesets where all probes mapped to one gene and no other. I can probably dig up the code for you if you choose to use Ensembl.

ADD COMMENT
0
Entering edit mode

if it develops in that direction (Ensembl) i'll get back to you thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6