Hello , i have e a file contain 2805 contigs the shortest one have a size of 37 Nucleotide and i want to delete all contigs that are lower than 200 Nucleotide can anyone tell me linux command line i can use Thank you
Hello , i have e a file contain 2805 contigs the shortest one have a size of 37 Nucleotide and i want to delete all contigs that are lower than 200 Nucleotide can anyone tell me linux command line i can use Thank you
One option is to use reformat.sh
from the bbmap package
reformat.sh in=contigs.fasta out=filtered.fasta minlength=200
You can perform this task if you install BioPerl module Bio::SeqIO. Then you can save the script below as filter_contigs.pl in the same directory as file with contigs and run the script with perl filter_contigs.pl
. It will remove contigs that are shorter than 200 bp from input file contigs.fasta and save the output to file contigs_filt_200.fasta.
use Bio::SeqIO;
# Setting minimum length to 200
my $min_len = 200;
# Reading the input fasta file
my $seqio_in = Bio::SeqIO->new(-file => "contigs.fasta",
-format => "fasta" );
# Creating the output fasta file
my $seqio_out = Bio::SeqIO->new(-file => ">contigs_filt_200.fasta",
-format => "fasta" );
# Saving sequences to the output if length >= min_len
while ( my $seq = $seqio_in->next_seq ) {
if ( $seq->length >= $min_len ) {
$seqio_out->write_seq($seq);
}
}
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi, how do you calculate the length of your shortest conting? whats command or program did you use?
Use one of the solutions here: How to find shortest lenth or longest length from fasta file