Refseq vs original genome assembly and issues
1
0
Entering edit mode
6.0 years ago
Biogeek ▴ 470

A long winded post with multiple questions to gauge the consensus of the 'correct' approach to RNA-seq alignment when there is a Refseq vs published assembly version of a genome present.

I've got the scenario where there is a published genome available as V1.0 and V1.1 (Genbank), and also on refseq.

I've aligned my mRNA-seq reads with STAR to Aiptasiav1.1 using the available Genbank (GCA_) files available in the above link. I've now performed gene counts and differential expression. When I use http://aiptasia.reefgenomics.org/download/aipgene_to_kxj.tsv.gz to convert the NCBI gene accessions to the original AIPGENE concessions to get functional gene annotations, I notice that there are 5 more genes in the v1.1 version compared to the v1.0 functional annotations (aipgene_to_kxj.tsv file is used for to map back). This has me questioning using version 1.1 all together. Ive looked for these 5 missing genes on NCBI and they indeed have functional annotations on NCBI.

My questions are:

Do I revert back to v 1.0 where everything seems to be complete (the original version which was published before it was submitted to Genbank) or do I download the Refseq gff3 and scaffolds and re-do my analysis. If so, how can I obtain the functional gene annotations from refseq to know what the genes are, and then how can I go about getting GO terms for downstream analysis?

Thanks for your time!

refseq genome alignment annotation • 1.3k views
ADD COMMENT
0
Entering edit mode
6.0 years ago
GenoMax 147k

Prior discussion about this also took place in: NCBI genome version vs published genome version - what's 'better'?

ADD COMMENT

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6