Dear all,
being new to NGS data analysis and I am currently trying to work myself through the Biostarhandbook. Although it appears to be a rather simple task I am struggeling with generating a custom genome for ebola in IGV. I followed the instructions (code) given in chapter 16 of the handbook to generate the annotation (gff) file from the ebola genbank file. However, although IGV displays the genome sequence it does not show the annotation although the file is provided during the create .genome process. Also manually loading the gff file into the genome does not change anything.
My fasta genome file looks like so:
$ cat 1976.fa | head
>AF086833
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
TTTTCCTCTCATTGAAATTTATATCGGAATTTAAATTGAAATTGTTACTGTAATCACACCTGGTTTGTTT
CAGAGCCACATCACAAAGATAGAGAACAACCTAGGTCTCCGAAGGGAGCAAGGGCATCAGTGTGCTCAGT
TGAAAATCCCTTGTCAACACCTAGGTCTTATCACATCACAAGTTCCACCTCAGACTCTGCAGGGTGATCC
AACAACCTTAATAGAAACATTATTGTTAAAGGACAGCATTAGTTCACAGTCAAACAAGCAAGATTGAGAA
TTAACCTTGGTTTTGAACTTGAACACTTAGGGGATTGAAGATTCAACAACCCTAAAGCTTGGGGTAAAAC
ATTGGAAATAGTTAAAAGACAAATTGCTCGGAATCACAAAATTCCGAGTATGGATTCTCGTCCTCAGAAA
ATCTGGATGGCGCCGAGTCTCACTGAATCTGACATGGATTACCACAAGATCTTGACAGCAGGTCTGTCCG
TTCAACAGGGGATTGTTCGGCAAAGAGTCATCCCAGTGTATCAAGTAAACAATCTTGAAGAAATTTGCC
and my gff file like so:
##gff-version 3
AF086833 EMBL gene 56 3026 . + . ID=AF086833.3;gene=NP
AF086833 EMBL gene 3032 4407 . + . ID=AF086833.9;gene=VP35
AF086833 EMBL gene 4390 5894 . + . ID=AF086833.13;gene=VP40
AF086833 EMBL gene 5900 8305 . + . ID=AF086833.20;gene=GP
AF086833 EMBL gene 8288 9740 . + . ID=AF086833.34;gene=VP30
AF086833 EMBL gene 9885 11518 . + . ID=AF086833.41;gene=VP24;note=putative
AF086833 EMBL gene 11501 18282 . + . ID=AF086833.47;gene=L
Could anybody point me towards a solution? Cheers Ricky
Did you try "Open File" or "Import Regions"?
According to the instructions I added the gff file in Create Genome/Optional/Gene file when I also set the fasta file for the genome. I also tried to add the gff file manually after opening the genome in IGV using "Open File" and according to your suggestion now used "Import Regions", no success though.
There is not much to see in the genome unless you load an alignment file in. Have you done that after creating the "custom" genome? Remember to zoom in significantly before you will start seeing features on the GTF track/read in the alignment window.
The genome is rather small (ca 19 kb) an relatively gene rich so annotation should be easily visible, zooming doesn't help at all. I can load a bam file and alignment of reads is displayed nicely, so that works, but still no genes.
It is possible that IGV has started checking fasta sequence identifiers strictly since these directions were written. There is a mismatch between the chromosome name in the
1976.fa
file and the1976-genes.gff
file.Try the following:
Genomes
-->Manage Genome List
. Delete the custom Ebola genome you made.1976.fa
file in an editor and remove the version number from the accession number. Change>AF086833.2
to>AF086833
. Save the file. If you have a previous1976.fa.fai
file in the directory where you saved the genome delete that so IGV will be forced to recreate the index.I just confirmed that this works.
Thanks a lot for the suggestion, changed the file to
Solved the problem somehow partially, now IGV displays at least the last entry of the list (gene L) in its correct position. Strangely enough the other ones are not shown, although I cannot see any difference in formatting now between the lines.
I asked you to edit the sequence file header (fasta) not the GFF file. Use the original GFF file as is. Can you try it again?
Sorry for that, was a bit confused. The header of my genome sequence fasta file is:
so no version number in AF086833 and I changed back the gff file to:
Did the genome again in IGV and indeed, now it displays the genes. However, overlapping genes are represented as a single bar. Is there any possibility I can change that to overlapping bars arranged on top of each other?
I meanwhile found out: change visualization from collapsed to expanded
I think that is the way IGV handles annotation files. You could split the genes into multiple GFF files and then load them to see if that sort of does what you want.