Data management in R, perl or python
3
0
Entering edit mode
4.1 years ago
MSRS ▴ 590

Hi, thank you for answering my problem.

My data is in format as below:

A1P
D4M
N6G
A1F
D4S
N6L
A1C

I want the output or output should be :

A1P/F/C
D4M/S
N6G/L

Is any R package code available? perl or python code will also be great. Thank you very much. Sorry for wasting your valuable time.

R perl python • 1.5k views
ADD COMMENT
1
Entering edit mode

Please add more details to your question. What does "/F/C" and "\L" stand for?

ADD REPLY
1
Entering edit mode

they want the suffix after the 2 letters collected and appended. This is a basic programming question for which they should show at least some effort in some language

ADD REPLY
0
Entering edit mode

English Alphabet! Basically, it will be used for amino acid (single letter code) and Nucleotide data formating.

ADD REPLY
1
Entering edit mode

I'd look out for residue positions >9 - that will result in total length being >3, and scripts below that don't account for it will fail. JC's solution will work best in that case.

ADD REPLY
1
Entering edit mode

yeah, I was thinking the OP could have a position >9 in the inputs

ADD REPLY
4
Entering edit mode
4.1 years ago
JC 13k

Perl:

#!/usr/bin/perl
use strict;
use warnings;
my %data;
while (<>) {
    chomp;
    if (m/(\w\d+)(\w)/) {
        my $key = $1;
        my $new = $2;
        if (defined $data{$key}) {
            $data{$key} .= "/$new";
        }
        else {
            $data{$key} = $new;
        }
    }
}
while (my ($key, $aa) = each %data) {
    print "$key$aa\n";
}

Test:

$ perl comb.pl < list.txt
A1P/F/C
D4M/S
N6G/L
ADD COMMENT
0
Entering edit mode

Thank you, JC. Excellent!

ADD REPLY
2
Entering edit mode
4.1 years ago

python solution

from collections import defaultdict
result = defaultdict(str)
for line in open("input.txt").readlines():
  line = line.strip()
  result[line[:2]] = "/".join([result[line[:2]],line[-1]])
with open("output.txt","a") as file:
  for first,second in result.items():
    file.write(first+second[1:]+"\n")
ADD COMMENT
0
Entering edit mode

Thank you for sharing your scripts.

ADD REPLY
0
Entering edit mode

By the way, you don't need to bookmark every answer. You can bookmark the top level post, and that way you'll have access to all the answers.

ADD REPLY
0
Entering edit mode

Sorry for that. I will follow your instruction. Thank you very much for the correction.

ADD REPLY
0
Entering edit mode

Don't worry about it - it's not a "Don't do this", it's just "you don't need to". Our bookmarks section can get cluttered easily.

ADD REPLY
2
Entering edit mode
4.1 years ago
sed 's/^\(..\)/\1\t/' input.txt | datamash  -t $'\t' -s -g 1  collapse 2 
A1  P,F,C
D4  M,S
N6  G,L
ADD COMMENT

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6