Substitute The File From Respective Definitions In Tab Separated File
2
0
Entering edit mode
12.0 years ago
macmath ▴ 170

I am trying to substitute the FILE 2 containing the >o387128170 to respective >MJ from FILE 1

File1 *tab separated file*

MJ    o387128170
NH    s292492417
PA    t253987822
PS    m392982075
RB    b91205976
RG    RG01000991

File2

>o387128170
MPRRRVVAKREVLPDPKYANIGLAKFINVIMKDGKKSVAEKITYGALASVSENKGGNGLE
>s292492417
MPRKRVITGREVLPDPKYGSAQLAKFITILMLNGKKSLAEKIVYGALDRAAEKANQEPMD
>t253987822
MPRRRVIGQRKILPDPKFGSELLAKFVNILMVDGKKSTAEAIVYSALETLAQRSGKEHLE
>m392982075
MPRRRVAAKREVLADPKYGSQILAKFMNHVMESGKKAVAERIVYGALDKVKERGKADPLE
>b91205976
MSRRHAAEKRVILPDMKYNSVLLSRFINNIMKEGKKALAEKIVYSAFDKIEKKHKADPYQ
>RG01000991
MKNGKKAVAEKIVYGALEWLNLRLKEKSKKAEGASGENNQNKQQDLDAATLLDVFDQALD

Expected Output file:

>MJ
MPRRRVVAKREVLPDPKYANIGLAKFINVIMKDGKKSVAEKITYGALASVSENKGGNGLE
>NH
MPRKRVITGREVLPDPKYGSAQLAKFITILMLNGKKSLAEKIVYGALDRAAEKANQEPMD
>PA
MPRRRVIGQRKILPDPKFGSELLAKFVNILMVDGKKSTAEAIVYSALETLAQRSGKEHLE
>PS
MPRRRVAAKREVLADPKYGSQILAKFMNHVMESGKKAVAERIVYGALDKVKERGKADPLE
>RB
MSRRHAAEKRVILPDMKYNSVLLSRFINNIMKEGKKALAEKIVYSAFDKIEKKHKADPYQ
>RG
MKNGKKAVAEKIVYGALEWLNLRLKEKSKKAEGASGENNQNKQQDLDAATLLDVFDQALD
awk • 2.4k views
ADD COMMENT
2
Entering edit mode
  1. This is extremely similar to your other question - do you really need to post both? 2. Please make an effort to format for better readability. Try indenting lines showing file content with 4 spaces and take out the asterisks.
ADD REPLY
0
Entering edit mode

Because both have different approach to get the needful... In this case I need the fasta files address lines changed by the tabulated one and in the other post is the change in the nexus file

I will surely try to make effort for proper indenting in the question

ADD REPLY
1
Entering edit mode

But the approach is exactly the same for both problems - as you indicated yourself with the tags awk, grep, sed. You should also indicate what you have tried already. Anyway - we will leave both questions for now and see what other users say.

ADD REPLY
0
Entering edit mode

Command line stuff are great, but why try to squeeze complex functions out of them when you can use an expressive scripting language (python/perl) and accomplish the same thing in probably a similar amount of time. I would sit down and learn how to script with python or perl so you don't have to keep asking people to help you with these pretty simple data manipulation tasks. These tasks are actually perfect for learning.

ADD REPLY
0
Entering edit mode

This kind of stuff does not need to use python/perl since it is a stream edition which to be done with something like sed/awk. In bioinformatics and computer sciences, I believe that you need to choose the best langage adapted for your task. Do not need to take a canon to kill a fly!

ADD REPLY
0
Entering edit mode

It depends on the problem you are trying to solve and your familiarity with whatever method you choose to use. I agree that this is a pretty simple manipulation problem and can be done with command line functions.

However, I feel like there is a trend of people who try to solve all their problems with command line functions. It becomes more of a code-golf exercise where you spend your time generating an elegant single liner rather than just solving the problem at hand. I am guilty of that from time to time.

For beginners, command line can also sometimes become a crutch or an excuse not to learn scripting.

ADD REPLY
0
Entering edit mode

I am sorry but awk is a programming langage as perl/python and not a command line function. It can be called on a command line like perl can.

ADD REPLY
5
Entering edit mode
12.0 years ago

Using awk (file1 needs to be given before file2 as arguments):

awk '/^>/{print dic[$1]} !/^>/{if(NF==2) dic[">"$2]=">"$1; else print $0}' file1 file2

You first use file1 to create a dictionnary and then you use it to make the substitutions.

++

ADD COMMENT
0
Entering edit mode

@Pierre, is that you?

ADD REPLY
0
Entering edit mode
12.0 years ago
Alex ★ 1.5k

In case if you don't have awk)

import os, sys

def main(file_name1, file_name2):
    # Read a dictionary file
    trans_dict = {}
    with open(file_name1, "r") as fh:
        for line in fh:
            try:
                (to_str, from_str) = line.strip().split("\t")
                trans_dict[from_str] = to_str
            except:
                pass
    # Read a data file
    resultdata = []
    with open(file_name2, "r") as fh:
        for line in fh:
            if line.startswith(">"):
                key = line.strip()[1:]
                if key in trans_dict:
                    line = ">%s\n" % trans_dict[key]
            resultdata.append(line)
    # Write results
    with open(file_name2, "w") as fh:
        fh.writelines("".join(resultdata))

if __name__ == '__main__':
    arg = sys.argv[1:]
    if len(arg) != 2:
        print("Usage: script.py file_name1 file_name2")
        exit()
    file_name1 = arg[0]
    file_name2 = arg[1]
    if os.path.isfile(file_name1) and os.path.isfile(file_name2):
        main(file_name1, file_name2)
    else:
        print("Error, can't open input file.")
        exit()
ADD COMMENT

Login before adding your answer.

Traffic: 2064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6