Annotating Sequences For Gbrowse - Which Is The Database And Which Is The Query?
1
3
Entering edit mode
14.5 years ago

Let's say I have some small sequences that I wish to display in Gbrowse. I want to create tracks from Blast results to show where genic regions might be.

Do I create a Blast index of the small sequences or of the known gene database?

If I use the gene database as an index, are blast-to-gff conversion scripts designed to use query coordinates instead of reference coordinates?

blast gff • 3.2k views
ADD COMMENT
3
Entering edit mode
14.5 years ago
Neilfws 49k

GBrowse uses GFF files, in which column 1 is described as "The ID of the landmark used to establish the coordinate system for the current feature." So, you want "reference" coordinates.

The best way to think about this is that both your known genes and your BLAST alignments are features which can be mapped to a chromosome. Your BLAST database should not be the small sequences, but I'm not sure that it should be the "known gene database" either. I would approach this by creating a known gene track with chromosome as the reference and a BLAST track by BLASTing the small sequence (query) against the chromosome (database).

Alternatively, it may be that you just want to show the BLAST alignment compared with a gene, in which case the BLAST database is known genes and you'd be creating a large number of "overview" features (one for each gene), with the reference coordinate system going from gene start to gene end. This could get quite messy in GBrowse.

If you're interested in "gene-centric" visualisations, it may be better to use Bioperl's Bio::Graphics module to generate individual PNG plots per gene.

ADD COMMENT
0
Entering edit mode

hmm there is are no chromosomes available yet for this species and my known genes are actually ESTs (I oversold that to reduce confusion). Someone must have a pipeline for this type of small-scale BAC visualization.

ADD REPLY
0
Entering edit mode

You can use contig IDs in the first column of GFF. Check ESTs/genes for repeat sequences, then use them as a query against contigs DB. Watch for long fasta headers in both (= create your own shorter & uniq ids if needed).

ADD REPLY
0
Entering edit mode

OK. I wrote the answer late last night, so apologies for lack of clarity. You definitely want to BLAST small sequences (query) versus ESTs (database). I'd still consider generating plots per EST, rather than GBrowse, if the ESTs are not mapped to some kind of larger reference sequence.

ADD REPLY

Login before adding your answer.

Traffic: 2697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6