Dear all!
I found a putative problem when I using tobacco genome. I tried to align cds sequences (Address:ftp://ftp.solgenomics.net/genomes/Nicotiana_tabacum/annotation/Ntab-BX_AWOK-SS_Basma.cds.annot.fna) of BX to assembly file (fasta format)(Address:ftp://ftp.solgenomics.net/genomes/Nicotiana_tabacum/assembly/Ntab-BX_AWOK-SS.fa.gz) by using blastn, while some sequences can't hit onto assembly file which like follows:
# BLASTN 2.2.31+
# Query: mRNA_10_cds mRNA_10 gene_5|id=AT1G64200.1:evalue=1e-124:annot='vacuolar H+-ATPase subunit E isoform 3';id=Solyc08g081910.2.1:evalue=1e-154:annot='V-type proton ATPase subunit E'
# Database: Ntb
# 0 hits found
# BLASTN 2.2.31+
# Query: mRNA_11_cds mRNA_11 gene_6|id=AT5G53340.1:evalue=6e-166:annot='Galactosyltransferase family protein';id=Solyc08g081920.2.1:evalue=0.0:annot='Beta-1 3-galactosyltransferase 6'
# Database: Ntb
# 0 hits found
# BLASTN 2.2.31+
# Query: mRNA_13_cds mRNA_13 gene_6|id=AT5G53340.1:evalue=5e-109:annot='Galactosyltransferase family protein';id=Solyc08g081920.2.1:evalue=1e-134:annot='Beta-1 3-galactosyltransferase 6'
# Database: Ntb
# 0 hits found
# BLASTN 2.2.31+
# Query: mRNA_15_cds mRNA_15 gene_6|id=AT5G53340.2:evalue=2e-45:annot='Galactosyltransferase family protein';id=Solyc08g081920.2.1:evalue=5e-88:annot='Beta-1 3-galactosyltransferase 6'
# Database: Ntb
# 0 hits found
# BLASTN 2.2.31+
# Query: mRNA_16_cds mRNA_16 gene_7|id=AT4G23730.1:evalue=1e-117:annot='Galactose mutarotase-like superfamily protein';id=Solyc08g081930.2.1:evalue=1e-142:annot='Aldose 1-epimerase family protein'
# Database: Ntb
# 0 hits found
# BLASTN 2.2.31+
# Query: mRNA_17_cds mRNA_17 gene_7|id=AT4G23730.1:evalue=2e-169:annot='Galactose mutarotase-like superfamily protein';id=Solyc08g081930.2.1:evalue=0.0:annot='Aldose 1-epimerase family protein'
# Database: Ntb
# 0 hits found
# BLASTN 2.2.31+
I don't know what happened, if you have some suggestions, and please tell me.
Thanks all!
If that happens just for some of the CDS sequences I wouldn't be very surprised. Assemblies are usually incomplete, and CDS might have been defined using RNAseq data. I would BLAST some of these CDS using NCBI's BLAST databases restricting to Nicotiana tabacum to see what's going on.