Hi Everyone, I have been given a task to compare one gene sequence among 50 strain of E.coli. For this study i have 50 genome scaffold file and one gene sequence file. Now my work will be compare the gene sequence among all the genome and compute a phylogenetic tree of the gene among all sequence. If anybody could point me in the right direction, I would be thankful!.
Thank you for your suggestion. If i would proceed like this : 1.Take all 50 genome sequence 2.Take the protein sequence of the specific gene 3. Run a local tblastn 4.extract all the sequences and run clustalw. 5. Generate the tree. Please comment on my idea.
sounds good. but, as i have mentioned, tblastn will give you a set of hits, which you'll need to somehow process in order to extract gene sequences - i.e eyeball and pick the best hit manually assigning start stop _if scaffolds are of good quality_... but what if there are gaps and mis-assemblies - what will be that best hit? using gene prediction may be better option: you run prodigal which will yield you protein sequences, then blastp those with your gene - pick a best hit putative gene sequence from each of 50 and proceed with the tree construction.
Hi i am trying to run all 50 genome like that but could you please check why it is not rename the out put file according to input file name.
shouldn't it be
"$file"_trans
and"$file"_nuc
?