How To Edit Contig Format??
3
0
Entering edit mode
11.1 years ago
HG ★ 1.2k

Hi I have some contig like that :

  >NODE_1_length_262931_cov_39.3702_ID_6774944
    GACGGTGGCAGTGTGGTGTTCCCGCCGGTGCTGGTGCAGATGCTCGACCGGCTGGAAAGT
    GAAATCCTGGCTGACCGGGTGAGTGAGGAAAGCCGCCGCTGGCTGGCATCGTGCGGCCTG
    ACGGTGGAGCAGATGCAAAACCAGATGGACCCGGTGTACACGCCGGCGCGAAAAATCCAC
    CTGTACCACTGCGACCATCGCGGCCTGCCGCTGGCCCTTATCAGTAAGGAAGGGGCAACA  
    >NODE_2_length_249731_cov_38.5539_ID_6775452
    TGAGGCAGCACCTGGCACGGCTGGGACGGAAGTCGCTGTCGTTCTCAAAATCGGTGGAGC
    TGCATGACAAAGTCATCGGGCATTATCTGAACATAAAACACTATCAATAAGTTGGAGTCA
    TTACCAACATCGTGAAAGAAATCCACAATAATGATCTTAAGCAGCAATTGATGAGTGAAT
 .....................................................................................................................

I Need like this each sequence in single line like that

   >NODE_1_length_262931_cov_39.3702_ID_6774944
   GACGGTGGCAGTGTGGTGTTCCCGCCGGTGCTGGTGCAGAT..............
   >NODE_2_length_249731_cov_38.5539_ID_6775452
   TGAGGCAGCACCTGGCACGGCTGGGACGGAAGTCGCTGTCGTTCTCAAAATCGGTGGAGC............................................

How can i convert the formate using sed/awk/perl

Thank you advance

awk perl • 2.6k views
ADD COMMENT
0
Entering edit mode

I'm intrigued as to what software requires single-line sequence in Fasta? It should not matter...

ADD REPLY
0
Entering edit mode
11.1 years ago

Here you go:

cat foo.fa | awk 'BEGIN{ORS="";OFS="";}{if(NR == 1) {print $0,"\n"} else {if(substr($0, 1, 1) == ">") {print "\n",$0,"\n";} else {print $0;}}}END{print "\n";}'

Or if you want it formatted so you can actually follow along:

cat foo.fa | awk '
BEGIN{
    ORS="";
    OFS="";
}
{
    if(NR == 1) {
        print $0,"\n"
    } else {
        if(substr($0, 1, 1) == ">") {
            print "\n",$0,"\n";
        } else {
            print $0;
        }
    }
}
END{
    print "\n";
}'
ADD COMMENT
0
Entering edit mode

Thank you So much For your Kind Suggestion

ADD REPLY
0
Entering edit mode

FYI, you'll need a > new_file.fa on the end there. I forgot to mention that!

ADD REPLY
0
Entering edit mode
11.1 years ago
Kenosis ★ 1.3k

Here's a Perl option:

use strict;
use warnings;
use 5.014;

$/ = '>';

while (<>) {
    chomp;
    say '>' . (s/.+?\n\K(.+)/$1 =~ s!\n!!gr/ser) if /\S/;
}

Usage: perl script.pl inFile [>outFile]

The last, optional parameter directs output to a file.

This reads the file in 'chunks,' using '>' as the record separator, \ Keeps the header info, removes all newlines from the captured sequence, and then prints the results.

Output on your dataset:

>NODE_1_length_262931_cov_39.3702_ID_6774944
GACGGTGGCAGTGTGGTGTTCCCGCCGGTGCTGGTGCAGATGCTCGACCGGCTGGAAAGTGAAATCCTGGCTGACCGGGTGAGTGAGGAAAGCCGCCGCTGGCTGGCATCGTGCGGCCTGACGGTGGAGCAGATGCAAAACCAGATGGACCCGGTGTACACGCCGGCGCGAAAAATCCACCTGTACCACTGCGACCATCGCGGCCTGCCGCTGGCCCTTATCAGTAAGGAAGGGGCAACA  
>NODE_2_length_249731_cov_38.5539_ID_6775452
TGAGGCAGCACCTGGCACGGCTGGGACGGAAGTCGCTGTCGTTCTCAAAATCGGTGGAGCTGCATGACAAAGTCATCGGGCATTATCTGAACATAAAACACTATCAATAAGTTGGAGTCATTACCAACATCGTGAAAGAAATCCACAATAATGATCTTAAGCAGCAATTGATGAGTGAAT

Hope this helps!

ADD COMMENT
0
Entering edit mode

Thank you so much........

ADD REPLY
0
Entering edit mode
11.1 years ago
SES 8.6k

Here are some other suggestions for converting multi-line records to single line. With seqtk:

seqtk seq -l0 in.fa > out.fa

Or, with a slightly longer line using Bioperl (not using the best practices, just a short example):

perl -MBio::SeqIO -e '$seqio = Bio::SeqIO->new(-file => shift); while ($seq = $seqio->next_seq) { print ">".$seq->id, "\n", $seq->seq, "\n"; }' in.fa > out.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6