Question

How do I identify gene duplication or paralogs in an annotated reference genome assembly

0

Entering edit mode

7 months ago

brimaloney24 • 0

I generated a high quality de novo genome assembly which has been annotated using previously published isoseq data from the species. This genome is now the Refseq for the species. I am currently writing a manuscript detailing the assembly. My PI suggested that I add a quick biologically relevent analysis to the text to show the utility of my genome. Specifically, she said I should investigate if any of a list of candidate genes we have for a specific trait are duplicated or expanded.

To do this, she said I simply need to blast the sequence of my gene of interest , as annotated in this assembly, against the whole assembly. I have looked for papers which do this to reference, and in sifting through them have only succeeded in confusing myself.

As far as I understand, the specific method to do this is the same as this method to identify paralogs using blastp (I am unsure why i would use blastp to identify duplicated genes, but the papers I am reading all seem to agree on blastp instead of blastn):

https://ubwp.buffalo.edu/wnygirahcp/wp-content/uploads/sites/25/2014/05/Module-7.-Duplication-and-Degradation.pdf

where I

1) take the FASTA nucleotide sequence for my gene of interest (as determined by the annotation of my genome) and blast (do I use blastn? or blastp?) specifying the database as nr (nonredundant protein) and organism as my species of interest

2) Once I get my blast results back, my top hit will be that same gene I blasted

3) If any other results have a significant E value and score, those are potential paralogs/duplicated genes? How might I verify or validate that. or could I only say these are putative paralogs?

Am I missing anything? Is there a way to screen all annotated genes for duplication/paralogs that would make more sense then repeatedly blasting through the list? I am very lost as to how to proceed

blast duplication paralog genome • 424 views

ADD COMMENT • link updated 7 months ago by sansan96 ▴ 130 • written 7 months ago by brimaloney24 • 0

score 1 · Answer 1 · 2024-04-13

1

Entering edit mode

7 months ago

sansan96 ▴ 130

Hello,

I think MCScanX could help you, there is a lot of information about it and it is easy to use:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326336/

https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)

ADD COMMENT • link 7 months ago by sansan96 ▴ 130