Why Perl Or Sed Command Not Working
5
0
Entering edit mode
11.0 years ago
biolab ★ 1.4k

Hi everyone I have a fasta file like below.

>miR156a
GACAGAA
>miR156b
GACAGAA
>miR156c
GACAGAA
............

I need to format it as below.

    miR156a   GACAGAA
    miR156b   GACAGAA
    miR156c   GACAGAA
    ............

Firstly i replace all new line with tab, and then replace > with new line. In the first step, I used the command sed -e 's/\n/\t/g' IN > OUT. It didn't work. I tried an alternative perl command cat IN | perl -ne 's/\n/\t/' > OUT. This time OUT file contains nothing. What's my problem? Thank you very much for your answers!

perl • 5.3k views
ADD COMMENT
0
Entering edit mode

Following my question, i tried new perl command cat IN | perl -ne 'while (<>) {chomp; print "$_\t"}' > OUT and get the following output.

GACAGAA >miR156b^M  GACAGAA >miR156c^M  GACAGAA  ......

Probably mixed use of WINDOWS and LINUX. Could anyone give me some suggestions and comments? Thanks a lot!

ADD REPLY
0
Entering edit mode

looks like your input file comes from windows and you are on *NIX machine. try running it through dos2unix first e.g. cat IN | dos2unix | perl ...

ADD REPLY
8
Entering edit mode
11.0 years ago
Pavel Senin ★ 1.9k
cat test.fa | sed -n '/>/ {h; N; s/>//; s/[\r\n]/\t/; p}'

miR156a    GACAGAA
miR156b    GACAGAA
miR156c    GACAGAA

how it works:

sed -n '          # turn off default printing
 />/{             # if the pattern matches a sequence header
 h;               # put it in the hold space
 N;               # fetch the next line
 s/>//;           # remove a '>' symbol
 s/[\r\n]/\t/g;   # 'g' - replace all new line with tab
 p }              # print it
 '
ADD COMMENT
0
Entering edit mode

Nice, that's rather more concise than my awk solution!

ADD REPLY
0
Entering edit mode

thanks! i hope it'll work for OP.

ADD REPLY
0
Entering edit mode

And you could:

cat test.fa | sed 'h; N; s/>\(.*\)[\r\n]/\1\t/'
ADD REPLY
4
Entering edit mode
11.0 years ago

You're creating an extremely long line, at least if your input file is largish. That's likely screwing things up. Why not just do things in one step:

awk 'BEGIN{ORS="";OFS="";}{gsub(">","",$1); if(NR%2==0) {print "\t",$1,"\n"} else {print "\t",$1}}' foo.fa
ADD COMMENT
5
Entering edit mode

awk '{x=substr($0,2);getline;print x"\t"$0;}' foo.fa

ADD REPLY
0
Entering edit mode

Nice, I guess i have a penchant for verbosity :P

ADD REPLY
0
Entering edit mode

this one is cool!

ADD REPLY
0
Entering edit mode

Thank you both! The commands work well!

ADD REPLY
4
Entering edit mode
11.0 years ago
Kenosis ★ 1.3k

Here's another option:

perl -pne 's/>(.+)[\r\n]/$1\t/' foo.fa

Output on your dataset:

miR156a    GACAGAA
miR156b    GACAGAA
miR156c    GACAGAA
ADD COMMENT
3
Entering edit mode
11.0 years ago

Since TMTOWTDI ;), here is another Perl-based method, which does not assume that the FASTA sequence is located in one single line following the header:

perl -076 -l12 -ne 'next unless /\w/; chomp; @b = split /\n/; $h = shift @b; $s = join "", @b; print "$h\t$s";' IN > OUT

Here is how it works:

-0 76  : Sets the IFS as ">" (which is `76` in octal format) so that you can iterate through chunks of FASTA sequences
-l 12  : Sets the OFS as "\n" (which is `12` in octal format) and performs automatic line ending processing
-n     : Specifies that the script should automatically loop through every available chunk, separated by IFS. 
-e     : Tells the perl interpreter that the following text is a line of perl code

next unless /\w/; -> Skips any chunk that does not contain data (which is essentially the first chunk, preceding the first occurrence of the ">" symbol)
chomp;            -> Removes any traces of the IFS from the chunk being processed
@b = split /\n/;  -> Splits the chunk into an array, at every newline char
$h = shift @b;    -> Extracts first element of array which is the FASTA header
$s = join "", @b; -> Joins the rest of the array elements into a string, which corresponds to the sequence
print "$h\t$s";   -> Prints out the header and the sequence delimited by a tab
ADD COMMENT
1
Entering edit mode

Good thought about posible multi-line sequences. Here's another option to handle that case:

perl -076 -nE 'chomp;s/(.+)\n/$1\t/;s/\n//g;say' foo.fa
ADD REPLY
0
Entering edit mode

thanks for the informative answer.

ADD REPLY
2
Entering edit mode
11.0 years ago
Vivek ★ 2.7k
awk '{if(NR % 2 == 1) printf substr($0,2)"\t"; else print $0}' file.fa

Another variation with awk

ADD COMMENT
2
Entering edit mode

And with just a few minor changes (but none to your logic):

awk '{printf(NR%2)?substr($0,2)"\t":$0"\n"}' foo.fa
ADD REPLY

Login before adding your answer.

Traffic: 1501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6