Finding gene matches using sequence reads?
0
0
Entering edit mode
4.8 years ago
anabaena ▴ 10

Hey all, I am looking through large data sets to try and find a gene cluster or particular genes using sequence reads. The goal is to find a metagenomic set with the genes of interest, essentially just a probe to see if what I am looking for is there. Currently, I am using Bowtie2 but I was wondering if there were any other ways to go about this? And if Bowtie2 is the best option, are there particular parameters I should consider/set when doing the alignment?

Thanks!

bowtie2 metagenomic reads • 1.5k views
ADD COMMENT
1
Entering edit mode

It might be a good idea to assemble the reads and use a tool like Fraggenescan (it's really fast) to predict the gene sequences and then you can use blast to search them against your genes of interest.

The following pipeline might give you some ideas:

https://experiments.springernature.com/articles/10.1007/978-1-4939-7015-5_3

ADD REPLY
0
Entering edit mode

Awesome! So my main concern is that I have many samples to look through. The initial thought was to probe a metagenomic sample/read set and look for the gene of interest, and if present then go in an assemble reads and give the sample a deeper look. So with that being said would this pipeline be efficient in doing this?

ADD REPLY
0
Entering edit mode

I have a question regarding Fraggenescan, do you need to train it? The documentation seems a little sparse on it

ADD REPLY
1
Entering edit mode

No, you don't need to train. You may use one of the trained models based on the data that you're using. You may get better results if assemble your reads before using Fraggenescan.

Since you already have your genes of interest maybe you don't need to use FragGeneScan to predict the reads, you can directly use a read mapper (Bowtie2/HISAT2). But if you're interested in predicting the genes in a metagenome or single genome or reads (Prokaryotic samples) you can use Fraggenescan.

For reference genomes you can use

-complete=1 -train=complete

For Illumina sequencing reads with about 0.5% error rate you can use

-complete=0 -train=illumina_5

Usage:

./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name]
-complete=[1 or 0] -train=[train_file_name] (-thread=[number of thread; default 1])

Parameters

   [seq_file_name]:    sequence file name including the full path
   [output_file_name]: output file name including the full path
   [1 or 0]:1 if the sequence file has complete genomic sequences
   0 if the sequence file has short sequence reads
    [train_file_name]: file name that contains model parameters; this file should be in the "train" directory
    Note that four files containing model parameters already exist in the "train" directory
    [complete] for complete genomic sequences or short sequence reads without sequencing error
    [sanger_5] for Sanger sequencing reads with about 0.5% error rate
    [sanger_10] for Sanger sequencing reads with about 1% error rate
    [454_10] for 454 pyrosequencing reads with about 1% error rate
    [454_30] for 454 pyrosequencing reads with about 3% error rate
    [illumina_5] for Illumina sequencing reads with about 0.5% error rate
    [illumina_10] for Illumina sequencing reads with about 1% error rate
    [num_thread]:       number of thread used in FragGeneScan. Default 1.

Please let me know if you have any other questions.

ADD REPLY
0
Entering edit mode

Perfect, I think I'll just stick with bowtie2 then since I already know what I am looking for. One question that I have is that I am using the entire cluster sequence, and this cluster may only have a few core proteins that are conserved. Is it recommended to use the -local parameter with bowtie2 give this? I've only really used bowtie2 with a full reference genome.

ADD REPLY
0
Entering edit mode

Yes, I think --local option sounds good since reads are short.

 --local            local alignment; ends might be soft clipped (off)
ADD REPLY
0
Entering edit mode

You don't need to use the whole pipeline (Fun4me) . You can just use Fraggenescan to predict the gene sequences and then use blast.

Please see the last line of this link (examples of Fraggenescan results using metagenomes as input):

https://omics.informatics.indiana.edu/FragGeneScan/result.php

I haven't seen your data so I'm not sure what's the best approach.

ADD REPLY

Login before adding your answer.

Traffic: 1159 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6