Hi,
I have RNA-seq data of HIV infected cells, which I now want to map to a mixed human-HIV genome. For the creation of that genome, I need the GTF file of my HIV strand. I din't find strain specific annotation files for HIV. Do you maybe know where one could find something like that, or a better way to evaluate transcript abundance of HIV in RNA-seq data?
Ok I thought I could convert my annotations in genius by hand in the gff text file, to convert it to a GTF file, but I am very uncertain, if my annotations a sufficient for that.
My GFF file looks like this:
pNL4-3 Geneious region 1 9709 . + 0 Is_circular=true
pNL4-3 Geneious insertion 1186 1186 . + . Name=p17/p24
pNL4-3 Geneious polyA_signal 9602 9607 . + . Name=POLY_A
pNL4-3 Geneious LTR 9076 9709 . + . Name=3'_LTR
pNL4-3 Geneious LTR 1 634 . + . Name=5'_LTR
pNL4-3 Geneious invisible_Parent 8888 15012 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346934513538.20
pNL4-3 Geneious invisible_Parent 5304 8887 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346934513382.19
pNL4-3 Geneious misc_feature 5005 5034 . . . Name=Fragment3
pNL4-3 Geneious misc_feature 5743 5744 . + . Name=JNCTN_NY5/LAV
pNL4-3 Geneious repeat_region 454 551 . + . Name=R
pNL4-3 Geneious repeat_region 9529 9626 . + . Name=R
pNL4-3 Geneious repeat_region 552 634 . + . Name=U5
pNL4-3 Geneious intron 744 5776 . + . Name=TAT/REV/NEF_I
pNL4-3 Geneious intron 6045 8368 . + . Name=TAT_II
pNL4-3 Geneious intron 6045 8368 . + . Name=TAT/REV/NEF_II
pNL4-3 Geneious intron 6045 8368 . + . Name=REV_II
pNL4-3 Geneious CDS 2085 5096 . + . Name=POL
pNL4-3 Geneious CDS 5969 8643 . + . Name=REV
pNL4-3 Geneious CDS 5830 8414 . + . Name=TAT
pNL4-3 Geneious CDS 6221 8785 . + . Name=ENV
pNL4-3 Geneious CDS 790 2292 . + . Name=GAG
pNL4-3 Geneious CDS 8787 9407 . + . Name=NEF
pNL4-3 Geneious CDS 5041 5619 . + . Name=VIF
pNL4-3 Geneious CDS 5559 5849 . + . Name=VPR
pNL4-3 Geneious CDS 6061 6306 . + . Name=VPU
pNL4-3 Geneious splicing signal 5059 5060 . + . Name=SD2b
pNL4-3 Geneious splicing signal 4963 4964 . + . Name=SD2
pNL4-3 Geneious splicing signal 5974 5975 . + . Name=SA5
pNL4-3 Geneious splicing signal 6720 6721 . + . Name=(SD5)
pNL4-3 Geneious splicing signal 744 745 . + . Name=SD1
pNL4-3 Geneious splicing signal 6045 6046 . + . Name=SD4
pNL4-3 Geneious splicing signal 5388 5389 . + . Name=SA2
pNL4-3 Geneious splicing signal 8367 8368 . + . Name=SA7
pNL4-3 Geneious splicing signal 5464 5465 . + . Name=SD3
pNL4-3 Geneious splicing signal 5775 5776 . + . Name=SA3
pNL4-3 Geneious splicing signal 5952 5953 . + . Name=SA4a
pNL4-3 Geneious splicing signal 5934 5935 . + . Name=SA4c
pNL4-3 Geneious splicing signal 5958 5959 . + . Name=SA4b
pNL4-3 Geneious splicing signal 4911 4912 . + . Name=SA1
pNL4-3 Geneious splicing signal 6602 6603 . + . Name=(SA6)
pNL4-3 Geneious invisible_Parent 5786 7812 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938163915.21
pNL4-3 Geneious invisible_Parent 7813 15494 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938164320.22
pNL4-3 Geneious invisible_Parent 5304 7812 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938256243.23
pNL4-3 Geneious invisible_Parent 7813 15012 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938256306.24
pNL4-3 Geneious invisible_Parent 639 5785 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938340538.25
pNL4-3 Geneious invisible_Parent 5786 10347 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938340632.26
pNL4-3 Geneious invisible_Parent 639 5303 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938465400.27
pNL4-3 Geneious invisible_Parent 5304 10347 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938465494.28
pNL4-3 Geneious invisible_Parent 712 5785 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938540694.29
pNL4-3 Geneious invisible_Parent 5786 10420 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938540787.30
pNL4-3 Geneious invisible_Parent 712 5303 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938613678.31
pNL4-3 Geneious invisible_Parent 5304 10420 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938613756.32
pNL4-3 Geneious invisible_Parent 5786 8465 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941525286.33
pNL4-3 Geneious invisible_Parent 8466 15494 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941525411.34
pNL4-3 Geneious invisible_Parent 5304 8465 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941600376.35
pNL4-3 Geneious invisible_Parent 8466 15012 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941600470.36
pNL4-3 Geneious invisible_Parent 712 5303 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1347027336363.0
pNL4-3 Geneious invisible_Parent 5304 10420 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1347027336628.1
Do I have to look up the exon borders and insert them manually? Shoudl I delete the first line and do I have to delete the splice signal entries?
Do you maybe know another way to get to e.g. an exemplary HIV GTF file fro comaprison? Or even the one I need?
If you have the genbank file you could try using a genbank2gtf type program to make one up. Here is one repo.
Thank you! I have the annotation in genious and can download the GFF file from there. I just have to convert it then, which I guess can be done by hand, since the file is not that large.
Hi, caggtaagtat ,
I wonder if your HIV NL4-3 GFF/GTF file works? I have the same question and I could not find GFF/GTF of NL4-3 despite intensive google search.
Best,
Xiao
Sequence for HIV NL4-3 is available here. You could download the genbank format file and then try to make the GTF file.
The GTF file should contain all the transcripts of NL4-3, not just the DNA sequences. There are no such annotations of NL4-3 transcripts on the Internet.