Why Can'T I Find Any Gene Sequences In The Associated Clone/Contig
1
4
Entering edit mode
14.1 years ago
Andrea_Bio ★ 2.8k

Hi

I've returned to bioinformatics after a long absence and the landscape has changed a great deal since I've been away and there is much I don't understand.

I've been trying to learn how to use the ensembl genome browser and I'm confused because when I looked for a gene sequence within its associated contig I couldn't find it. If anyone could spare 2 minutes to help me with this I would really appreciate it.

As an example I looked in the human genome database in the archive release 58 (this is the one used by the ensembl course book) and i was looking at this region on chromosome 3

Chromosome 3: 129,247,483-129,254,012

I looked at gene GP9 (ENSG00000169704) which is in the contig AC108673.11>. This gene is supposed to be on the forward strand and the sequence in the contig strand is also the forward strand (as indicated by the > I presume). I searched for a random sequence with no associated variations from the GP9 gene within the contig and couldn't find it. I took this sequence from the first exon

CAGCTGTATCCCATAGAGTT

and searched the contig sequence for it and no match was found.

I also tried all sorts of variations of this sequence (the sequence in reverse [3' to 5'], the sequence complement in both directions) to no avail.

I also tried this for several other genes on the chromosome and couldn't find the gene sequence within the contig. This seems extremely odd to me so I must be doing something very stupid!

Thanks in advance for your help

gene contigs • 3.0k views
ADD COMMENT
3
Entering edit mode
14.1 years ago
Darked89 4.7k

Sequence CAGCTGTATCCCATAGAGTT can be found both in the original AC108673.11 @NCBI and in Ensembl 59 assuming you got these as fasta and are using a simple text search, i.e Ctrl-F from Mozilla browser.

If by "no match was found." you mean the result of some similarity search (BLAT?) then with 20 nucleotides you are below the minimal match score threshold. You will need something longer, i.e.:

>ENSG00000169704:ENST00000307395 ENSE00001177133 exon:KNOWN_protein_coding
CAGCTGTATCCCATAGAGTTGCCACCCAGGCCTCAGCCAGGACCTTTCAGGCCAGACAGG
ADD COMMENT
1
Entering edit mode

I could find it in the current version

ADD REPLY
0
Entering edit mode

I was using the archive version May 2010 but I wouldn't have thought that made a difference? I simply copied the sequence verbatim (not in fasta format) from the gene sequence and then did a simple find (Ctrl F) in the contig sequence in IE browser

ADD REPLY
0
Entering edit mode

How blind am i? I didn't notice there were spaces in the sequence I was searching. Thanks a lot

ADD REPLY
0
Entering edit mode

Actually that length sequence BLATted quite happily at UCSC and mapped to the expected location

ADD REPLY

Login before adding your answer.

Traffic: 2446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6