Good day!
This morning I faced a problem. I decided to find paralogues of some human genes. Searching the web I've found the GeneDecks resource. I considered it to be not very convinient.
Are their any alternative tools to do this job? Is it possible to gain such information using UCSC (unfortunatly I have not found the propper button there %))?
Thanks.
Would the genomicSuperDups track be fulfill your needs?
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'desc genomicSuperDups'
+----------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+-------+
| bin | smallint(6) | NO | | 0 | |
| chrom | varchar(255) | NO | MUL | | |
| chromStart | int(10) unsigned | NO | | 0 | |
| chromEnd | int(10) unsigned | NO | | 0 | |
| name | varchar(255) | NO | MUL | | |
| score | int(10) unsigned | NO | | 0 | |
| strand | char(1) | NO | | | |
| otherChrom | varchar(255) | NO | | | |
| otherStart | int(10) unsigned | NO | | 0 | |
| otherEnd | int(10) unsigned | NO | | 0 | |
| otherSize | int(10) unsigned | NO | | 0 | |
|(...)
+----------------+------------------+------+-----+---------+-------+
This track shows regions detected as putative genomic duplications within the golden path. The following display conventions are used to distinguish levels of similarity:
Light to dark gray: 90 - 98% similarity
Light to dark yellow: 98 - 99% similarity
Light to dark orange: greater than 99% similarity
Red: duplications of greater than 98% similarity that lack sufficient Segmental Duplication Database evidence (most likely missed overlaps)
For a region to be included in the track, at least 1 Kb of the total sequence (containing at least 500 bp of non-RepeatMasked sequence) had to align and a sequence identity of at least 90% was required.
The assumption is orthologous genes have identical or highly related functions and this sharing is greater than for paralogs. But Nehrt, Hahn et al challenge this by offering that "the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act."
They combined experimentally derived function with gene expression data on nearly 9000 proteins.
This is certainly a controversial statement, but in a thought-provoking manner. After all, it is the integration of diverse data that are driving a lot of genomics. One example is GWAS (genome-wide association studies) + gene expression = better identification of likely causal variant. The same might be applied to the ortholog/paralog definition.
Thank you! I've been looking definetly for this track.