Extraction sequence from other sequence
2
0
Entering edit mode
6.8 years ago
fufuyou ▴ 110

Hello, I have lots of sequences from two parts: one is

>seqeunce1
ATGTGTGTTGTACAACTTTGGTATATACTGTATATACC42327477R_3222148F_10594369R_21575807R_1679947F_28391165R

other is

>42327477R_3222148F_10594369R_21575807R_1679947F_28391165R
TGTGTTGTACAACTTTGGTATATACTGTATATACC

I want to get the sequence like as following:

>seqeunce1
ATGTGTGTTGTACAACTTTGGTATATACTGTATATACC*TGTGTTGTACAACTTTGGTATATACTGTATATACC*

Could you have some methods for doing this? Thanks, Fuyou

Assembly • 1.5k views
ADD COMMENT
1
Entering edit mode

Hi,

Can you explain your issue a little more in detail? Do you want that code at the end replaced by the sequence from the other file? Or look whether seq2 matches seq1 and then concatenate them? (or are these equivalents of each other? == the code denotes seq2 is part of seq1?)

ADD REPLY
0
Entering edit mode

Thanks. yes. I want to use seq2's sequence to replace the code in seq1. Fuyou

ADD REPLY
0
Entering edit mode

is it correct to assume you have 2 files with all these sequences in? or a whole bunch of separate files?

ADD REPLY
0
Entering edit mode

Yes. I have two files and all sequences in two files. Thanks, Fuyou

ADD REPLY
1
Entering edit mode
6.8 years ago

you can give this perl script a try:

#!/usr/bin/env perl

# par1 is seq1 file, par2 is seq2 file

use strict;
use warnings;

my %seq2;
my $id;

open (SEQ2, "< $ARGV[1]");
#read in file 2 , store in hash
while (<SEQ2>) {
        next if ($_ =~ /^\n/);
        if ($_ =~/^>/){
                chomp;
        $id = $_;
                $id =~ s/>//;
        }
        else {
                chomp;
                $seq2{$id} = $_;
        }
}
close SEQ2;

open (SEQ1, "< $ARGV[0]");
while (<SEQ1>) {
        next if ($_ =~ /^\n/);
        if ($_ =~/^>/){
                print $_;
        }
        else {
                chomp;
                my $seq1 = $_;
                my ($repl) = ($seq1 =~ /(\d.+)$/);
                $seq1 =~ s/$repl/\*$seq2{$repl}\*/;
                print $seq1 . "\n";
        }
}
exit;

run it as follows:

perl script.pl seq1File seq2File > outFile

it will print output to outFile. It also assume all the parts to be replaced start with a number.

ADD COMMENT
0
Entering edit mode

Hello Lieven, Thanks, Fuyou

ADD REPLY
0
Entering edit mode

you're welcome!

If the solution fixes your problem, accept the answer ;-)

ADD REPLY
0
Entering edit mode
6.8 years ago
Yuyayuya ▴ 250

If the sequence is not too long, you might use grep to find the sequence and save to new file by using >. I'm not quite sure what's your question. Just guessing...

ADD COMMENT
0
Entering edit mode

Thanks. Some sequence is long.

ADD REPLY

Login before adding your answer.

Traffic: 3010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6