A long winded post with multiple questions to gauge the consensus of the 'correct' approach to RNA-seq alignment when there is a Refseq vs published assembly version of a genome present.
I've got the scenario where there is a published genome available as V1.0 and V1.1 (Genbank), and also on refseq.
I've aligned my mRNA-seq reads with STAR to Aiptasiav1.1 using the available Genbank (GCA_) files available in the above link. I've now performed gene counts and differential expression. When I use http://aiptasia.reefgenomics.org/download/aipgene_to_kxj.tsv.gz to convert the NCBI gene accessions to the original AIPGENE concessions to get functional gene annotations, I notice that there are 5 more genes in the v1.1 version compared to the v1.0 functional annotations (aipgene_to_kxj.tsv file is used for to map back). This has me questioning using version 1.1 all together. Ive looked for these 5 missing genes on NCBI and they indeed have functional annotations on NCBI.
My questions are:
Do I revert back to v 1.0 where everything seems to be complete (the original version which was published before it was submitted to Genbank) or do I download the Refseq gff3 and scaffolds and re-do my analysis. If so, how can I obtain the functional gene annotations from refseq to know what the genes are, and then how can I go about getting GO terms for downstream analysis?
Thanks for your time!