Bos Taurus, Bos Indicus and Bubalus bubalis GTF,GFF,GFF3 file
0
0
Entering edit mode
11 weeks ago
Muhammad • 0

Dear All, I want to do some comparative genomics analysis using cattle and Buffalo Genome GTF and GFF files. Can some body share the URL for it ? I searched NCBI but i found that refseq file for all of three reference genomes in NCBI uses contig assembly number something like this

NC_037328.1 RefSeq  region  1   158534110   .   +   .   ID=NC_037328.1:1..158534110;Dbxref=taxon:9913;Name=1;breed=Hereford;chromosome=1;gbkey=Src;genome=chromosome;isolate=L1 Dominette 01449 registration number 42190680;mol_type=genomic DNA;sex=female;tissue-type=left lung
NC_037328.1 Gnomon  pseudogene  207933  217580  .   -   .   ID=gene-LOC112

But I want GTF files with chromsome number someting like this

chr1    ncbiRefSeq  transcript  210759  214966  .   -   .   gene_id "LOC112447072"; transcript_id "XR_003035142.1";  gene_name "LOC112447072";
chr1    ncbiRefSeq  exon    210759  212235  .   -   .   gene_id "LOC112447072"; transcript_id "XR_003035142.1"; exon_number "1"; exon_id "XR_003035142.1.1"; gene_name "LOC112447072";
chr1    ncbiRefSeq  exon    212941  213154  .   -   .   gene_id "LOC112447072"; transcript_id "XR_003035142.1"; exon_number "2"; exon_id "XR_003035142.1.2"; gene_name "LOC112447072";
chr1    ncbiRefSeq  exon    214935  214966  .   -   .   gene_id "LOC112447072"; transcript_id "XR_003035142.1"; exon_number "3"; exon_id "XR_003035142.1.3"; gene_name "LOC112447072";
chr1    ncbiRefSeq  transcript  217517  257046  .   -   .   gene_id "LOC101903639"; transcript_id "XR_003035135.1";  gene_name "LOC101903639";
chr1    ncbiRefSeq  exon    217517  219285  .   -   .   gene_id "LOC101903639"; transcript_id "XR_003035135.1"; exon_number "1"; exon_id "XR_003035135.1.1"; gene_name "LOC101903639";
chr1    ncbiRefSeq  exon    229250  229332  .   -

where can i find this please share a URL for this NCBI URL or other URL OR some solution for it ? Also an additional Question if i want to work with some genome whose file dont exists how can i found or made it ?

and Buffalo cattle • 378 views
ADD COMMENT
0
Entering edit mode

One solution is to find and use tables to convert the RefSeq identifiers to standard chromosome numbers. For example, there's a table here you could use to convert the identifiers in your GTF file to standard chromosome names prefixed with “chr”:

#!/usr/bin/env bash

#  Save the mapping table to a TXT file
printf \
"1\tNC_037328.1\n\
2\tNC_037329.1\n\
3\tNC_037330.1\n\
4\tNC_037331.1\n\
5\tNC_037332.1\n\
6\tNC_037333.1\n\
7\tNC_037334.1\n\
8\tNC_037335.1\n\
9\tNC_037336.1\n\
10\tNC_037337.1\n\
11\tNC_037338.1\n\
12\tNC_037339.1\n\
13\tNC_037340.1\n\
14\tNC_037341.1\n\
15\tNC_037342.1\n\
16\tNC_037343.1\n\
17\tNC_037344.1\n\
18\tNC_037345.1\n\
19\tNC_037346.1\n\
20\tNC_037347.1\n\
21\tNC_037348.1\n\
22\tNC_037349.1\n\
23\tNC_037350.1\n\
24\tNC_037351.1\n\
25\tNC_037352.1\n\
26\tNC_037353.1\n\
27\tNC_037354.1\n\
28\tNC_037355.1\n\
29\tNC_037356.1\n\
X\tNC_037357.1\n\
Y\tNC_082638.1\n\
MT\tNC_006853.1\n" \
    > chr_map.txt

#  Convert identifiers in the GTF file
awk 'NR==FNR { map[$2] = "chr" $1; next } $1 in map { $1 = map[$1] }1' \
    chr_map.txt \
    original.gtf \
        > converted.gtf

Also an additional Question if i want to work with some genome whose file dont exists how can i found or made it ?

I’m not entirely sure what you’re asking. If a genome file doesn’t exist (for example, if it’s for an organism that hasn’t been sequenced), then there won’t be a way for you to find or generate it.

ADD REPLY
0
Entering edit mode

Also, it looks like this bison release only includes scaffolds and lacks assembled chromosomes (except for the mitochondrial chromosome).

ADD REPLY

Login before adding your answer.

Traffic: 2570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6