Entering edit mode
4.8 years ago
anabaena
▴
10
Hey all, I am looking through large data sets to try and find a gene cluster or particular genes using sequence reads. The goal is to find a metagenomic set with the genes of interest, essentially just a probe to see if what I am looking for is there. Currently, I am using Bowtie2 but I was wondering if there were any other ways to go about this? And if Bowtie2 is the best option, are there particular parameters I should consider/set when doing the alignment?
Thanks!
It might be a good idea to assemble the reads and use a tool like Fraggenescan (it's really fast) to predict the gene sequences and then you can use blast to search them against your genes of interest.
The following pipeline might give you some ideas:
https://experiments.springernature.com/articles/10.1007/978-1-4939-7015-5_3
Awesome! So my main concern is that I have many samples to look through. The initial thought was to probe a metagenomic sample/read set and look for the gene of interest, and if present then go in an assemble reads and give the sample a deeper look. So with that being said would this pipeline be efficient in doing this?
I have a question regarding Fraggenescan, do you need to train it? The documentation seems a little sparse on it
No, you don't need to train. You may use one of the trained models based on the data that you're using. You may get better results if assemble your reads before using Fraggenescan.
Since you already have your genes of interest maybe you don't need to use FragGeneScan to predict the reads, you can directly use a read mapper (Bowtie2/HISAT2). But if you're interested in predicting the genes in a metagenome or single genome or reads (Prokaryotic samples) you can use Fraggenescan.
For reference genomes you can use
For Illumina sequencing reads with about 0.5% error rate you can use
Usage:
Parameters
Please let me know if you have any other questions.
Perfect, I think I'll just stick with bowtie2 then since I already know what I am looking for. One question that I have is that I am using the entire cluster sequence, and this cluster may only have a few core proteins that are conserved. Is it recommended to use the -local parameter with bowtie2 give this? I've only really used bowtie2 with a full reference genome.
Yes, I think --local option sounds good since reads are short.
You don't need to use the whole pipeline (Fun4me) . You can just use Fraggenescan to predict the gene sequences and then use blast.
Please see the last line of this link (examples of Fraggenescan results using metagenomes as input):
https://omics.informatics.indiana.edu/FragGeneScan/result.php
I haven't seen your data so I'm not sure what's the best approach.