Linux shell script which can do this task
1
0
Entering edit mode
2.8 years ago

Hi, everyone .

I have a dataset like this .

1st column has protein Ids of different organisms and 2nd column has domain names.

Protein Ids domain
Abiotrophia_defectiva_peg_0144  wzz
Abiotrophia_defectiva_peg_0198  wxy
Abiotrophia_defectiva_peg_0200  wzz
Abiotrophia_defectiva_peg_0215  wca
Abyssicoccus_albus_123_peg_1185 wzz
Abyssicoccus_albus_123_peg_1189 wzx
Abyssicoccus_albus_123_peg_1200 wza
Abyssicoccus_albus_123_peg_1322 wca
Abyssicoccus_albus_123_peg_1324 wbb
Bradyrhizobium_elkanii_peg_6717 wac
Bradyrhizobium_elkanii_peg_6718 wzx
Bradyrhizobium_elkanii_peg_6721 waa
Bradyrhizobium_elkanii_peg_6752 wca
Bradyrhizobium_elkanii_peg_6780 wvx

I want to know which proteins are near by . means according to "peg" numbers if I say protein numbers coming under +/- 5 they are near by and they form cluster.

output should look like :

Abiotrophia_defectiva_peg_0198 wxy
Abiotrophia_defectiva_peg_0200 wzz
----------------------------------------
Abyssicoccus_albus_123_peg_1185 wzz
Abyssicoccus_albus_123_peg_1189 wzx
Abyssicoccus_albus_123_peg_1322 wca
Abyssicoccus_albus_123_peg_1324 wbb
-----------------------------------------
Bradyrhizobium_elkanii_peg_6717 wac
Bradyrhizobium_elkanii_peg_6718 wzx
Bradyrhizobium_elkanii_peg_6721 waa
-------------------------------------------

there should be a separate inf partition line between each cluster.

Is there any way I can do this task for my data . I can do manually in excel sheet but my dataset is very large. So I need some script for this .

Please do let me know . Thanks

shell-scripting Linux • 781 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
2.8 years ago
mti193 ▴ 10

You can do this with python pretty easily.

1) loop through the file line by line 2) Split the line by the first column by "_" and select the last "item" which in this case will be the numbers after "peg_". Python syntax for this is: peg_number = line.split("_")[-1]. This will grab the numbers you want (0144, 0200, etc.) 3) You would now want to store this number in a list (append), or dictionary and then check to see any of the following peg_numbers are within +/- 5 of the value.

ADD COMMENT

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6