I am very new in bioinformatics field and I want to know what is the different between reference genome file and annotation file. Also what is the best website that I can download these files from it for human.
I am very new in bioinformatics field and I want to know what is the different between reference genome file and annotation file. Also what is the best website that I can download these files from it for human.
Reference genome file is a description of the genome sequence. And annotation file is a description of where genetic element(intron, exon) located in the genome, in the form begin and end coordinate. Reference genome file are mostly in .fasta format and annotation are mostly in .gff or .bed format. Another format .genbank sometime contain both reference and annotation information. Google each format for details.
For human, the best way to download that file is http://genome.ucsc.edu/. You can also download it from ncbi.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Xingyu Yang, so what is the difference between GTF AND GFF annotation format.
They are pretty similar. GTF refers to version 2 of GFF (the most recent version is GFF3).
Thanks again Xingyu Yang, Could you please send me the direct link to download the human annotation file, and what about annotation file from Ensembl website.
If you want a direct link, I would recommend you download it here:http://cufflinks.cbcb.umd.edu/igenomes.html
I follow the link, and I found that
GRCh37 link: ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens_Ensembl_GRCh37.tar.gz
So is the human annotation file is about 17.9 GB
It include everything. Like different format of annotation, annotation of ncRNA, reference sequence, indexed reference sequences.
If you just want the annotation file, find it on ncbi ftp:ftp://ftp.ncbi.nih.gov/genomes/Homo_sapiens. Annotation file are in the GFF folder. The annotation file include ncRNA
I visited this link but it confused me because there are many files there so which of them is the annotation file for human
ftp://ftp.ncbi.nih.gov/genomes/Homo_sapiens/GFF/ref_GRCh38_top_level.gff3.gz