Dear all, please kindly help me to solve this problem by python script writing. I am a beginner in script writing.
Problem: A homologous gene group consists of genes, which have an alignment to at least one other gene in the group with both a % sequence coverage and % identity equal to or higher than a given stringency cutoff. Write a python program to find the number of homologous gene groups within a genome at various levels of stringency. For example with a stringency of 10, every gene in a group should have a % sequence coverage and % identity of at least 10% with at least one other gene in the group. The program should take a single genome file in genbank format on the command line and write a csv file called "output.csv" in the directory. The ouput file should consist of 10 lines, each line should have the stringency used (0 to 100, in increments of 10) and the number of homology groups. (See example below). write the python program as a txt file.
Example: 0,13456 10,234 20,234 30,200 40,190 50,187 60,187 70,100 80,95 90,55 100,45
please kindly help me. Thanks in advance.
Isn't this an exact duplicate of your last post?
yes. sorry just edited the tag line of my post.
Oh wait, just read it again. Do you want the number of gene pairs which fulfill the criteria or the number of "single linkage clusters" according to the criteria? F.e.: if gene A and B are in a group and C and A also full criteria, they can be seen to form the set {A, B, C} or {A,B} {A,C}, do you want to count that as a 1 or 2 ( technically do you want all transitive extensions or not)?
I will count that as 1.
Ok, treated in the pseudocode...
could you provide some examples input file(s)