Entering edit mode
3.9 years ago
kk.mahsa
▴
150
Hi everyone
I am trying to annotate a set of probes (Affymetrix) for a plant species. I used blastn to find the desired genome region for each probe. But the results confused me. Now I have several hits (with the same parameters such as identity and evalue) for each probe. What should I do now? What is your suggestion to solve this problem?
blastn -task blastn-short -evalue 0.0001 -db plant_genes -query plant_affy_probes.fasta -out outputfile.csv -perc_identity 100 -outfmt 7
it's been a while but if I remember correctly Affymetrix arrays uses probe-sets (== combination of multiple probes) to points a gene. So you need to combine several probes (or at least their hits on the genome) which will then , in most cases, point to a single gene. One probe can thus have several hits but the combination of probes should be more specific.
Why do you want/need to do this yourself btw? normally the array design comes with a file denoting to which gene the probes belong. (CDF file ?)
I use publicly available microarray data for meta-analysis. Plant species that I work on it have an annotated genome as the only source for probe set annotation. The annotation of probes in the Affymetrix site is very very poor and I have to use blast to the annotation of probes. How can I combine probes related to genes when I have no idea about it?
perhaps you can find some info in this publication (former colleagues of mine):
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-461
The array you use should in anyway come with a file denoting the probe set design of the manufacturer