Searching for genes in assembles genome database
2
0
Entering edit mode
4.9 years ago
anabaena ▴ 10

Hey all, I am currently working on a project that requires searching for homologous genes in a large assembled meta-genome database. Instead of looking for the abundance of an organism with Bowtie2, I want to locate a gene cluster with high homology, and its corresponding meta-genome. Is bowtie still the easiest way to do so? Or maybe a clustering algorithm and then ID which genome the contig belongs to? any tips would be appreciated, thank you!

metagenomics python • 863 views
ADD COMMENT
0
Entering edit mode

Do you have a reference for the proteins or do you want to generate de-novo clusters of orthologous proteins?

ADD REPLY
0
Entering edit mode

I have a reference for the cluster

ADD REPLY
2
Entering edit mode
4.9 years ago
Asaf 10k

I would suggest using diamond blastx to map the meta-genome DNA directly to the reference set of proteins. Diamond also have a "frame-shift" mode so if the assembly has an indel (which might happen a lot with PACBIO/nanopore) you won't get out of the reading frame

ADD COMMENT
2
Entering edit mode
4.9 years ago
onestop_data ▴ 330

DIAMOND should be a great aligner for the task you are trying to accomplish. Maybe consider not using the fast version of the program - unless your query and database are really large). Set --sensitive when calling the program and it will give you the more sensitive alignments.

Moreover, be careful when you using assembled genomes as a query. Most aligners have a maximum number of alignments set to be returned in the output. If you have large contigs, only the top alignments will be returned for the contigs, and some other regions may be omitted. Does it make sense? I would advise you to first predict the ORFs for your assembled contigs and when using it as input.

Best

ADD COMMENT

Login before adding your answer.

Traffic: 1122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6