Question

How Do I View Gene Annotations For Non-Standard (E.G., Virus) Genome In IGV?

6

Entering edit mode

12.4 years ago

Obi Griffith 20k

I have uploaded a custom genome (HBV) and corresponding RNAseq alignments to IGV to visualize transcription of a virus in a tumor sample. That works great. But, now I am trying to upload a gene annotation track for the virus genome. Does anyone have a recommendation on the easiest way to do this? Since this virus genome is not in Ensembl or UCSC I don't think I can load directly from IGV servers. I am looking at annotation in GenBank and see options to export that as genbank (.gb), XML, and ASN.1 formats. But, I don't think these can be imported directly into IGV. Do I need to convert to a bed file or is there a better solution?

These are the features I would like to visualize as a gene track: http://www.ncbi.nlm.nih.gov/nuccore/21326584

igv genbank gene annotation • 16k views

ADD COMMENT • link 10.3 years ago by Obi Griffith 20k

score 11 · Answer 1 · 2013-05-16

Expanding on Pierre's answer. I actually already had the genome in IGV loaded directly from a fasta file. What I was failing to get was the corresponding gene annotations from the genbank record. But, the IGV link which Pierre provides does explain that when you create a .genome file you can optionally supply a gff file for gene annotations. That was the part I was missing. So, I did the following:

1) Download the fasta file for custom genome of interest. In this case it was a specific genome assembly for HBV (accession: HE974372).

First, Choose 'Fasta' display for that record: http://www.ncbi.nlm.nih.gov/nuccore/399923469?report=fasta
Save the fasta file for this record: 'Send' -> 'Complete Record' -> 'File' -> 'Format=FASTA' -> 'Create File' (e.g., save as 'HBV_D4_HE974372.fasta')

2) Download the corresponding genbank record for custom genome of interest.

First, Choose 'GenBank' display for that record: http://www.ncbi.nlm.nih.gov/nuccore/399923469?report=genbank
Save a .gb file for this record: 'Send' -> 'Complete Record' -> 'File' -> 'Format=GenBank' -> 'Create File' (e.g., save as 'HBV_D4_HE974372.gb')

3) Convert the genbank file to gff3 format as per instructions here: http://bcb.io/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/

If needed, install bioperl:

sudo apt install bioperl

Then run the tool on gb file as follows:

bp_genbank2gff3 -out stdout HBV_D4_HE974372.gb > HBV_D4_HE974372.gff

4) Create an alias file so that sequence names in fasta (gi|399923469|emb|HE974372.1|) will be correctly mapped to sequence names in gff file (HE974372). I wasn't sure about the order so I just created both mappings in a file called 'HBV_D4_alias.tab' which looked like:

HE974372    gi|399923469|emb|HE974372.1|
gi|399923469|emb|HE974372.1|    HE974372

You could also probably just edit the fasta file to use the shortened sequence name.

5) Create the .genome file in IGV. From the menu, 'Genomes' -> 'Create .genome file'.

Unique identifier = 'HBV_D4'
Descriptive name = 'HBV genotype D4 complete genome, isolate Mart-B36'
Fasta file (browse to HBV_D4_HE974372.fasta)
Gene file (browse to HBV_D4_HE974372.gff)
Alias file (browse to HBV_D4_alias.tab)

Then, hit 'Ok' and save as HBV_D4.genome. With my rnaseq bam file loaded I now see reads in the context of annotated genes for this custom reference genome. Nice!

custom gene annotations in IGV

NOTE: It seems that with the passage of time this procedure (at least in some cases) has gotten a lot simpler. I just repeated the exercise with Bovine papillomavirus 1 (NC_001522.1). I downloaded the fasta file as previously in step 1. I then was able to directly download a GFF3 file in step 2 (instead of GB file). This allowed me to skip step 3. It also so happened that the sequence names in my bam file, fasta file, and gff3 file were consistent so no alias file was need. Thus I skipped straight to step 5 and created a .genome file with just the unique identifier, descriptive name, fasta file, and gene file and saw the intended result. This should be the case if you used the same fasta record when creating your indexed reference for alignment. I suspect the move by NCBI to drop GIs in favor of just accessions may have helped here.

score 2 · Answer 2 · 2013-05-16

See "Loading a genome" in http://www.broadinstitute.org/igv/LoadGenome

This option supports defining a reference genome by loading either an IGV .genome file or a FASTA file. The .genome file is created as described below. FASTA files must be plain text (not gzipped), and must be indexed with a .fai as defined by the Samtools suite (http://sourceforge.net/projects/samtools/). If the file is not indexed, IGV will attempt to index it