Entering edit mode
9.7 years ago
summerela
▴
190
I downloaded RefSeq's top_level gff3 file (ref_GRCh37.p13_top_level.gff3.gz
) from their ftp site but cannot find any documentation on what each of the specific columns contain. I was able to glean information on the standard gff format columns and could probably guess at some of them, but it would be nice to have a definitive explanation. Does anyone know where I can find this information?
The fields in question are:
gbkey
genome
mol_type
description
gene
part
pseudo
product
transcript_id
gene_synonym
partial
ncrna_class
protein_id
exon_number
exception
transl_except
anticodon
Target
e_value
bit_score
num_ident
blast_aligner
pct_identity_gap
num_mismatch
pct_identity_ungap
gap_count
pct_coverage
pct_coverage_hiqual
pct_identity_gapopen_only
common_component
filter_score
weighted_identity
rank
assembly_bases_seq
assembly_bases_aln
for_remapping
matched_bases
matchable_bases
lxr_locAcc_currStat_120
matches
identity
splices
consensus_splices
product_coverage
exon_identity
idty
merge_aligner
map
lxr_locAcc_currStat_35
inversion_merge_aligner
country
isolation-source
note
tissue-type
codons
transl_table
Thanks so much!
PS- Here's a link to their README file, I didn't see this information anywhere, but maybe I missed something?
ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/README
Have a look to the INSDC documentation, the terms in common should have the same definition. http://www.insdc.org/files/feature_table.html