Creating frequency table from CD-HIT output in Python
1
0
Entering edit mode
2.1 years ago
Paula ▴ 60

Hi!

I apologize in advance for creating an open question without attempting to solve the coding problem. I am quite confused with this coding problem.

I am trying to create a frequency table based on a nested list in Python. The table must include the name of the cluster and the frequency in which the samples (labeled as SOL_X_) appear in the sublists The input list looks like this:

Cluster 13

0 3489aa, >SOL_1_10_length_33340_cov_18.252877_23... *

1 61aa, >SOL_1_40_length_4804_cov_17.346178_1... at 95.08%

Cluster 14

0 3456aa, >SOL_1_10_length_66994_cov_17.869433_28... *

The desired result is a table that contains the clusters in the first row and the frequency in which sample is found in a cluster in the other rows

desired table

Thank you

python cd-hit • 699 views
ADD COMMENT
0
Entering edit mode

This seems like a non-trivial problem, although that is on a relative scale as I am not a programmer. Still, it seems to me that you are betting on a generosity of others to give you a complete solution without any work on your part. I predict this response will be the only one you get unless you show some effort.

ADD REPLY
0
Entering edit mode
2.1 years ago
Will • 0

I would suggest using a dictionary with the keys as "Cluster X" and then values as the strings that follow.

example_dict = {'Cluster 13': ['0 3489aa, >SOL_1_10_length_33340_cov_18.252877_23... ', '1 61aa, >SOL_1_40_length_4804_cov_17.346178_1... at 95.08%'], 'Cluster 14': ['0 3456aa, >SOL_1_10_length_66994_cov_17.869433_28... ']}

You can populate this dictionary by using for loops, conditionals ("if/else"), and the readline() method if the input is a file.

After the dictionary is filled with all necessary data, you can go through each key,value pair and count how many times SOL_X appears in the list. This info can then be added to a pandas dataframe -> excel file.

I hope this helps a little bit. This might not be an optimal solution, but it will do the trick.

Happy Coding!

ADD COMMENT

Login before adding your answer.

Traffic: 1714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6