Entering edit mode
9.6 years ago
div
▴
60
can anyone please tell me the script to check the size of the each given contig (contigs are in multi-fasta format).
can anyone please tell me the script to check the size of the each given contig (contigs are in multi-fasta format).
I provide a perl solution as follows (usage: perl length.pl fastafile
).
#!/usr/bin/perl use strict; use warnings; open my $fasta_file, '<', $ARGV[0] or die $!; my ($id, $seq); while (<$fasta_file>) { chomp; if (/^>(\S+).*/) { print "$id\t", length($seq), "\n" if defined $id; $id = $1; $seq = ''; } else { $seq .= $_; } } close $fasta_file; print "$id\t", length($seq), "\n";
From bioawk tutorial - https://github.com/vsbuffalo/bioawk-tutorial
bioawk -cfastx '{print $name, length($seq)}' test.fasta
Hi, try awk solution:
for i in *.fa
do awk 'BEGIN{RS=">"}NR>1{sub("\n","\t"); gsub("\n",""); print RS$0}' $i > ${i%.fasta}.column.csv
awk 'length($2) {print $1 "\t" length}' ${i%.fasta}.column.csv | sort -k 2n > ${i%.fasta}.csv
done;
Or you can put second awk command to pipe for avoiding create *column.csv file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This looks like a HW question. Don't give up easily and try some more :-). I assure you that you would enjoy the whole learning process. You can search for a few posts here on Biostar for help. Search for "length fasta sequences".
+1 A. Pandey. See also Code Golf: Mean Length Of Fasta Sequences