gtf file for canFam2 genome version
1
0
Entering edit mode
1 day ago
1769mkc ★ 1.2k

I'm trying to find out gtf file for this version of the canine I looked both ncbi as well as ucsc. I am not able to find the gtf file.

Here when I try to download I don't see the option to download the gtf file

https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002285.2/

Normally in ucsc there is a folder called genes as we see in case of hg19 https://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/genes/ where it contains the gtf file but that is not present in canFam2 ucsc.

https://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/

Is there a way which I can find already created gtf for the same version which is canFam2 either from ncbi or ucsc ?

It would be helpful to know if that is possible to download

gtffile • 226 views
ADD COMMENT
1
Entering edit mode

Are you specifically looking for canFam2? Likely because newer versions available now: https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=9612

ADD REPLY
0
Entering edit mode

yes I'm looking for this canFam2 only that for some specific cases I have to use, even though I have the newer version also

ADD REPLY
3
Entering edit mode
1 day ago

I wrote https://jvarkit.readthedocs.io/en/latest/KgToGff/

It was just a one-shot, I don't have used it much. Please check the results.

$ wget -qO - "https://hgdownload.cse.ucsc.edu/goldenPath/canFam2/database/ensGene.txt.gz" | gunzip -c |\
  java -jar dist/jvarkit.jar kg2gff --gtf | head -n 20
chr29   ucsc    gene    31843577    31869014    .   +   .   ID "GENE2"; Name "ENSCAFG00000008510"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; gene_type "protein_coding";
chr29   ucsc    transcript  31843577    31869014    .   +   .   ID "ENSCAFT00000013501.3"; Parent "GENE2"; Name "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; transcript_name "ENSCAFT00000013501";
chr29   ucsc    exon    31843577    31843766    .   +   .   ID "ENSCAFT00000013501%3AE0"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE0";
chr29   ucsc    exon    31862157    31862334    .   +   .   ID "ENSCAFT00000013501%3AE1"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE1";
chr29   ucsc    exon    31865271    31865385    .   +   .   ID "ENSCAFT00000013501%3AE2"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE2";
chr29   ucsc    exon    31868495    31868660    .   +   .   ID "ENSCAFT00000013501%3AE3"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE3";
chr29   ucsc    exon    31868849    31869014    .   +   .   ID "ENSCAFT00000013501%3AE4"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE4";
chr29   ucsc    CDS 31843577    31843766    .   +   0   ID "CDS4"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29   ucsc    CDS 31862157    31862334    .   +   2   ID "CDS5"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29   ucsc    CDS 31865271    31865385    .   +   1   ID "CDS6"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29   ucsc    CDS 31868495    31868660    .   +   0   ID "CDS7"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29   ucsc    CDS 31868849    31868910    .   +   2   ID "CDS8"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29   ucsc    three_prime_utr 31868911    31869014    .   +   .   ID "UTR9"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29   ucsc    start_codon 31843577    31843579    .   +   .   ID "codon10"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29   ucsc    stop_codon  31868908    31868910    .   +   .   ID "codon11"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr3    ucsc    gene    72230308    72416756    .   +   .   ID "GENE13"; Name "ENSCAFG00000015634"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; gene_type "protein_coding";
chr3    ucsc    transcript  72230308    72416756    .   +   .   ID "ENSCAFT00000024802.14"; Parent "GENE13"; Name "ENSCAFT00000024802.14"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; transcript_name "ENSCAFT00000024802";
chr3    ucsc    exon    72230308    72230403    .   +   .   ID "ENSCAFT00000024802%3AE0"; Parent "ENSCAFT00000024802.14"; Name "ENSCAFT00000024802"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; exon_id "ENSCAFT00000024802%3AE0";
chr3    ucsc    exon    72257459    72257619    .   +   .   ID "ENSCAFT00000024802%3AE1"; Parent "ENSCAFT00000024802.14"; Name "ENSCAFT00000024802"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; exon_id "ENSCAFT00000024802%3AE1";
chr3    ucsc    exon    72272584    72272708    .   +   .   ID "ENSCAFT00000024802%3AE2"; Parent "ENSCAFT00000024802.14"; Name "ENSCAFT00000024802"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; exon_id "ENSCAFT00000024802%3AE2";
ADD COMMENT
0
Entering edit mode

will give it a try thank you so much

ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6