blast extract specific gene linux command
1
0
Entering edit mode
4.1 years ago
tommy ▴ 40

Hello,

I have a question about BLAST. What I want is trying to build a phylogenetic tree building on a specific gene. But I met troubles at the first step. I don't know how to get this specific gene in the library with command line. And I couldn't find any tutorial about this as well.

What I have is a lot of gene FASTA files of yeast. And I want to build phylogenetic tree on AQY gene.

Would someone give me some guides about this?

Thank you in advance.

gene sequence • 2.3k views
ADD COMMENT
0
Entering edit mode

I don't know how to get this specific gene in the library with command line.

If you are have raw sequence data then you would either need to align it to a reference and then generate a consensus. If there is no reference available then you will need to assemble the data and then identify where the gene is located in that assembly.

ADD REPLY
0
Entering edit mode

So, instead of finding the specific gene, align then, and using phylogenetic tree, what I should do first is the alignment of the whole sequence? Cause I have found some phylogenetic tree build on one specific gene, so I am curious how to do that.

one of the article mentioned

In order to retrieve genes of interest, a local blast database was set for each genome and ORFS were searched with BLASTn using as queries: AQY1 and AQY2 of YPS163.

But I don't know how to do this.

Thanks for your help.

ADD REPLY
0
Entering edit mode

For building the phylogenetic tree, I think you need to extract the gene sequences, remove the redundant ones, align, ....

ADD REPLY
0
Entering edit mode

What kind of data do you have? It it next generation sequencing data or some other type?

ADD REPLY
0
Entering edit mode

yes, I have lots of .fa file of yeast.

ADD REPLY
0
Entering edit mode

This is an important piece of information and should have been included in the original question. Are these genome .fa files?

ADD REPLY
0
Entering edit mode

Yes, it is nucleo FASTA file. Thanks for letting me know, I have added some detail in the question.

ADD REPLY
0
Entering edit mode

If I interpret your question correctly, you want to find sequences in some other genomes that are similar to your yeast query genes. Then you want to retrieve sequences matching your queries, extract the coding sequences, presumably translate them, remove redundant sequences, do a multiple alignment on the proteins and construct a phylogenetic tree from the proteins.

All of these tasks are easy to do in the BIRCH system, which is specifically designed to leverage Unix-style systems like Linux and MacOSX. Perhaps the greatest strength of BIRCH is that, in addition to hundreds of bioinformatics programs, BIRCH has a substantial body of tutorials that take you through exactly these types of tasks step by step. To see BIRCH in action, visit our YouTube channel.

ADD REPLY
0
Entering edit mode

Thanks, Brian. But I think what I want to achieve is build a phylogenetic tree based on a specific gene. like Gallon 2016, they built on PAD1 gene. figure 5c https://www.cell.com/cell/fulltext/S0092-8674(16)31071-6?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867416310716%3Fshowall%3Dtrue

ADD REPLY
0
Entering edit mode

Do you have the accession number for that gene? Could you provide more information?

ADD REPLY
0
Entering edit mode

I am trying to analysis AQY1 and AQY2 gene of yeast. Thanks for your help.

ADD REPLY
0
Entering edit mode

one of the article mentioned

In order to retrieve genes of interest, a local blast database was set for each genome and ORFS were searched with BLASTn using as queries: AQY1 and AQY2 of YPS163.

I think this might be the right way, But I don't know how to do this.

Thanks for your help.

ADD REPLY
0
Entering edit mode

blastn searchs for a query (a nucleotide sequence) in a reference (e.g. reference genome(s)). So, I think they already had gene sequences for AQY1 and AQY2, and used it as query to search for similar genes in a genome of interest?

You can go to this link, select blastn, check the box next to "Align two or more sequences" https://blast.ncbi.nlm.nih.gov/BlastAlign.cgi

Then paste a gene sequence in the first text box, and a genome ID (NC_001144.5) in the next one, and see the results. You can do the same using the commandline, but you need to download blastn executable file.

If you don't have a genome of interest you can use the link without checking the box for alignment, and just use a gene query.

Alternatively, you can use tblastn and use a protein sequence/ protein accession number.

ADD REPLY
0
Entering edit mode

Thanks for your help. That's very helpful

ADD REPLY

Login before adding your answer.

Traffic: 2490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6