Hi everyone,
I have a set a genes (which I know are from a certain species) and I would like to know how many copies of each gene there are inside this species genome. The natural approach to me would be to use blastn, sice I want the acctual nucleotide copies in the genome and not genes that produce similar proteins.
But I'm having a difficult time finding which parameters I sould foccus on to certify which alignments represent a copy and which doesn't. Let's take this following example: using blastp with ACT14 gene sequence as query and the refseq_reference_genome of cotton (NCBI) as subject:
What I would focus is the '% identity' and the query coverage. Looking at the results I would say there is 3 copies of that gene, one in the chromossome NC_053447 and two in NC_053434, since those 3 alignments have 100% query coverage and an identity from 96,1% to 100%. The maximum query coverage of the other alignments is 61%, which doens't strike me as an acctual copy. But I'm not sure, are there other parameters I sould be looking at? Sould I be considering lower values of identity or coverage? Are there objective values I could use in case I find gradualy higher values, like 80% for identity or coverage?