IGV custom genome/Biostarhandbook Chapter 16
0
0
Entering edit mode
7.0 years ago
Ricky ▴ 50

Dear all,

being new to NGS data analysis and I am currently trying to work myself through the Biostarhandbook. Although it appears to be a rather simple task I am struggeling with generating a custom genome for ebola in IGV. I followed the instructions (code) given in chapter 16 of the handbook to generate the annotation (gff) file from the ebola genbank file. However, although IGV displays the genome sequence it does not show the annotation although the file is provided during the create .genome process. Also manually loading the gff file into the genome does not change anything. My fasta genome file looks like so:
$ cat 1976.fa | head

>AF086833
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
TTTTCCTCTCATTGAAATTTATATCGGAATTTAAATTGAAATTGTTACTGTAATCACACCTGGTTTGTTT
CAGAGCCACATCACAAAGATAGAGAACAACCTAGGTCTCCGAAGGGAGCAAGGGCATCAGTGTGCTCAGT
TGAAAATCCCTTGTCAACACCTAGGTCTTATCACATCACAAGTTCCACCTCAGACTCTGCAGGGTGATCC
AACAACCTTAATAGAAACATTATTGTTAAAGGACAGCATTAGTTCACAGTCAAACAAGCAAGATTGAGAA
TTAACCTTGGTTTTGAACTTGAACACTTAGGGGATTGAAGATTCAACAACCCTAAAGCTTGGGGTAAAAC
ATTGGAAATAGTTAAAAGACAAATTGCTCGGAATCACAAAATTCCGAGTATGGATTCTCGTCCTCAGAAA
ATCTGGATGGCGCCGAGTCTCACTGAATCTGACATGGATTACCACAAGATCTTGACAGCAGGTCTGTCCG
TTCAACAGGGGATTGTTCGGCAAAGAGTCATCCCAGTGTATCAAGTAAACAATCTTGAAGAAATTTGCC

and my gff file like so:

##gff-version 3
AF086833    EMBL    gene    56  3026    .   +   .   ID=AF086833.3;gene=NP
AF086833    EMBL    gene    3032    4407    .   +   .   ID=AF086833.9;gene=VP35
AF086833    EMBL    gene    4390    5894    .   +   .   ID=AF086833.13;gene=VP40
AF086833    EMBL    gene    5900    8305    .   +   .   ID=AF086833.20;gene=GP
AF086833    EMBL    gene    8288    9740    .   +   .   ID=AF086833.34;gene=VP30
AF086833    EMBL    gene    9885    11518   .   +   .   ID=AF086833.41;gene=VP24;note=putative
AF086833    EMBL    gene    11501   18282   .   +   .   ID=AF086833.47;gene=L

Could anybody point me towards a solution? Cheers Ricky

IGV RNA-Seq genome • 4.0k views
ADD COMMENT
0
Entering edit mode

Did you try "Open File" or "Import Regions"?

ADD REPLY
0
Entering edit mode

According to the instructions I added the gff file in Create Genome/Optional/Gene file when I also set the fasta file for the genome. I also tried to add the gff file manually after opening the genome in IGV using "Open File" and according to your suggestion now used "Import Regions", no success though.

ADD REPLY
0
Entering edit mode

There is not much to see in the genome unless you load an alignment file in. Have you done that after creating the "custom" genome? Remember to zoom in significantly before you will start seeing features on the GTF track/read in the alignment window.

ADD REPLY
0
Entering edit mode

The genome is rather small (ca 19 kb) an relatively gene rich so annotation should be easily visible, zooming doesn't help at all. I can load a bam file and alignment of reads is displayed nicely, so that works, but still no genes.

ADD REPLY
2
Entering edit mode

It is possible that IGV has started checking fasta sequence identifiers strictly since these directions were written. There is a mismatch between the chromosome name in the 1976.fa file and the 1976-genes.gff file.

Try the following:

  1. In IGV select a different genome. Go into Genomes --> Manage Genome List. Delete the custom Ebola genome you made.
  2. Open 1976.fa file in an editor and remove the version number from the accession number. Change >AF086833.2 to >AF086833. Save the file. If you have a previous 1976.fa.fai file in the directory where you saved the genome delete that so IGV will be forced to recreate the index.
  3. Follow the directions to make a new genome using this edited sequence file and the GFF file.

I just confirmed that this works.

ADD REPLY
0
Entering edit mode

Thanks a lot for the suggestion, changed the file to

  $ cat 1976-genes.gff
##gff-version 3
AF086833    EMBL    gene    56  3026    .   +   .   ID=AF086833;gene=NP
AF086833    EMBL    gene    3032    4407    .   +   .   ID=AF086833;gene=VP35
AF086833    EMBL    gene    4390    5894    .   +   .   ID=AF086833;gene=VP40
AF086833    EMBL    gene    5900    8305    .   +   .   ID=AF086833;gene=GP
AF086833    EMBL    gene    8288    9740    .   +   .   ID=AF086833;gene=VP30
AF086833    EMBL    gene    9885    11518   .   +   .   ID=AF086833;gene=VP24;note=putative
AF086833    EMBL    gene    11501   18282   .   +   .   ID=AF086833;gene=L

Solved the problem somehow partially, now IGV displays at least the last entry of the list (gene L) in its correct position. Strangely enough the other ones are not shown, although I cannot see any difference in formatting now between the lines.

ADD REPLY
1
Entering edit mode

I asked you to edit the sequence file header (fasta) not the GFF file. Use the original GFF file as is. Can you try it again?

ADD REPLY
0
Entering edit mode

Sorry for that, was a bit confused. The header of my genome sequence fasta file is:

$ cat 1976.fa | head
>AF086833
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA

so no version number in AF086833 and I changed back the gff file to:

##gff-version 3
AF086833    EMBL    gene    56  3026    .   +   .   ID=AF086833.3;gene=NP
AF086833    EMBL    gene    3032    4407    .   +   .   ID=AF086833.9;gene=VP35
AF086833    EMBL    gene    4390    5894    .   +   .   ID=AF086833.13;gene=VP40
AF086833    EMBL    gene    5900    8305    .   +   .   ID=AF086833.20;gene=GP
AF086833    EMBL    gene    8288    9740    .   +   .   ID=AF086833.34;gene=VP30
AF086833    EMBL    gene    9885    11518   .   +   .   ID=AF086833.41;gene=VP24;note=putative
AF086833    EMBL    gene    11501   18282   .   +   .   ID=AF086833.47;gene=L

Did the genome again in IGV and indeed, now it displays the genes. However, overlapping genes are represented as a single bar. Is there any possibility I can change that to overlapping bars arranged on top of each other?

I meanwhile found out: change visualization from collapsed to expanded

ADD REPLY
0
Entering edit mode

I think that is the way IGV handles annotation files. You could split the genes into multiple GFF files and then load them to see if that sort of does what you want.

ADD REPLY

Login before adding your answer.

Traffic: 1495 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6