Identifying gene copy number in genome assembly
0
0
Entering edit mode
4 hours ago
pedro.rrmb • 0

Hi everyone,

I have a set a genes (which I know are from a certain species) and I would like to know how many copies of each gene there are inside this species genome. The natural approach to me would be to use blastn, sice I want the acctual nucleotide copies in the genome and not genes that produce similar proteins.

But I'm having a difficult time finding which parameters I sould foccus on to certify which alignments represent a copy and which doesn't. Let's take this following example: using blastp with ACT14 gene sequence as query and the refseq_reference_genome of cotton (NCBI) as subject: enter image description here

What I would focus is the '% identity' and the query coverage. Looking at the results I would say there is 3 copies of that gene, one in the chromossome NC_053447 and two in NC_053434, since those 3 alignments have 100% query coverage and an identity from 96,1% to 100%. The maximum query coverage of the other alignments is 61%, which doens't strike me as an acctual copy. But I'm not sure, are there other parameters I sould be looking at? Sould I be considering lower values of identity or coverage? Are there objective values I could use in case I find gradualy higher values, like 80% for identity or coverage?

CNV copy assembly gene • 37 views
ADD COMMENT

Login before adding your answer.

Traffic: 1554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6