Question

Extraction sequence from other sequence

0

Entering edit mode

7.4 years ago

fufuyou ▴ 110

Hello， I have lots of sequences from two parts: one is

>seqeunce1
ATGTGTGTTGTACAACTTTGGTATATACTGTATATACC42327477R_3222148F_10594369R_21575807R_1679947F_28391165R

other is

>42327477R_3222148F_10594369R_21575807R_1679947F_28391165R
TGTGTTGTACAACTTTGGTATATACTGTATATACC

I want to get the sequence like as following:

>seqeunce1
ATGTGTGTTGTACAACTTTGGTATATACTGTATATACC*TGTGTTGTACAACTTTGGTATATACTGTATATACC*

Could you have some methods for doing this? Thanks, Fuyou

Assembly • 1.9k views

ADD COMMENT • link updated 7.4 years ago by lieven.sterck 15k • written 7.4 years ago by fufuyou ▴ 110

1

Entering edit mode

Hi,

Can you explain your issue a little more in detail? Do you want that code at the end replaced by the sequence from the other file? Or look whether seq2 matches seq1 and then concatenate them? (or are these equivalents of each other? == the code denotes seq2 is part of seq1?)

ADD REPLY • link 7.4 years ago by lieven.sterck 15k

0

Entering edit mode

Thanks. yes. I want to use seq2's sequence to replace the code in seq1. Fuyou

ADD REPLY • link 7.4 years ago by fufuyou ▴ 110

0

Entering edit mode

is it correct to assume you have 2 files with all these sequences in? or a whole bunch of separate files?

ADD REPLY • link 7.4 years ago by lieven.sterck 15k

0

Entering edit mode

Yes. I have two files and all sequences in two files. Thanks, Fuyou

ADD REPLY • link 7.4 years ago by fufuyou ▴ 110

score 1 · Answer 1 · 2018-02-12

you can give this perl script a try:

#!/usr/bin/env perl

# par1 is seq1 file, par2 is seq2 file

use strict;
use warnings;

my %seq2;
my $id;

open (SEQ2, "< $ARGV[1]");
#read in file 2 , store in hash
while (<SEQ2>) {
        next if ($_ =~ /^\n/);
        if ($_ =~/^>/){
                chomp;
        $id = $_;
                $id =~ s/>//;
        }
        else {
                chomp;
                $seq2{$id} = $_;
        }
}
close SEQ2;

open (SEQ1, "< $ARGV[0]");
while (<SEQ1>) {
        next if ($_ =~ /^\n/);
        if ($_ =~/^>/){
                print $_;
        }
        else {
                chomp;
                my $seq1 = $_;
                my ($repl) = ($seq1 =~ /(\d.+)$/);
                $seq1 =~ s/$repl/\*$seq2{$repl}\*/;
                print $seq1 . "\n";
        }
}
exit;

run it as follows:

perl script.pl seq1File seq2File > outFile

it will print output to outFile. It also assume all the parts to be replaced start with a number.

score 0 · Answer 2 · 2018-02-03

0

Entering edit mode

7.4 years ago

Yuyayuya ▴ 250

If the sequence is not too long, you might use grep to find the sequence and save to new file by using >. I'm not quite sure what's your question. Just guessing...