Remove contigs that are lower than 200
2
0
Entering edit mode
4.7 years ago
Bioinfo ▴ 20

Hello , i have e a file contain 2805 contigs the shortest one have a size of 37 Nucleotide and i want to delete all contigs that are lower than 200 Nucleotide can anyone tell me linux command line i can use Thank you

assembly sequencing genome next-gen • 5.3k views
ADD COMMENT
0
Entering edit mode

Hi, how do you calculate the length of your shortest conting? whats command or program did you use?

ADD REPLY
0
Entering edit mode
ADD REPLY
5
Entering edit mode
4.7 years ago

One option is to use reformat.sh from the bbmap package

reformat.sh in=contigs.fasta out=filtered.fasta minlength=200
ADD COMMENT
0
Entering edit mode

It's working , Thank you so much

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLY
4
Entering edit mode
4.7 years ago
zubenel ▴ 120

You can perform this task if you install BioPerl module Bio::SeqIO. Then you can save the script below as filter_contigs.pl in the same directory as file with contigs and run the script with perl filter_contigs.pl. It will remove contigs that are shorter than 200 bp from input file contigs.fasta and save the output to file contigs_filt_200.fasta.

use Bio::SeqIO;

# Setting minimum length to 200
my $min_len = 200;

# Reading the input fasta file
my $seqio_in = Bio::SeqIO->new(-file => "contigs.fasta", 
                             -format => "fasta" );

# Creating the output fasta file                             
my $seqio_out = Bio::SeqIO->new(-file => ">contigs_filt_200.fasta", 
                             -format => "fasta" );

# Saving sequences to the output if length >= min_len     
while ( my $seq = $seqio_in->next_seq ) {
    if ( $seq->length  >=  $min_len ) {
        $seqio_out->write_seq($seq);
    }
}
ADD COMMENT

Login before adding your answer.

Traffic: 2147 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6