script to check the size of given contig
3
0
Entering edit mode
9.7 years ago
div ▴ 60

can anyone please tell me the script to check the size of the each given contig (contigs are in multi-fasta format).

sequence • 3.6k views
ADD COMMENT
3
Entering edit mode

This looks like a HW question. Don't give up easily and try some more :-). I assure you that you would enjoy the whole learning process. You can search for a few posts here on Biostar for help. Search for "length fasta sequences".

ADD REPLY
2
Entering edit mode
ADD REPLY
2
Entering edit mode
9.7 years ago
biolab ★ 1.4k

I provide a perl solution as follows (usage: perl length.pl fastafile).

#!/usr/bin/perl
use strict;
use warnings;
open my $fasta_file, '<', $ARGV[0] or die $!;
my ($id, $seq);
while (<$fasta_file>) {
    chomp;
    if (/^>(\S+).*/) {
        print "$id\t", length($seq), "\n" if defined $id;
        $id = $1;
        $seq = '';
    } else {
        $seq .= $_;
    }
}
close $fasta_file;
print "$id\t", length($seq), "\n";
ADD COMMENT
0
Entering edit mode

Thank you all for reply..:)

Reply for biolab : thank u so much.....:) it worked...:)

ADD REPLY
3
Entering edit mode
9.7 years ago
rtliu ★ 2.2k

From bioawk tutorial - https://github.com/vsbuffalo/bioawk-tutorial

bioawk -cfastx '{print $name, length($seq)}' test.fasta
ADD COMMENT
2
Entering edit mode
7.9 years ago
Paul ★ 1.5k

Hi, try awk solution:

for i in *.fa

 do awk 'BEGIN{RS=">"}NR>1{sub("\n","\t"); gsub("\n",""); print RS$0}' $i > ${i%.fasta}.column.csv

  awk 'length($2) {print $1 "\t" length}' ${i%.fasta}.column.csv | sort -k 2n > ${i%.fasta}.csv

 done;

Or you can put second awk command to pipe for avoiding create *column.csv file.

ADD COMMENT

Login before adding your answer.

Traffic: 1747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6