How to filter a fasta file?
2
0
Entering edit mode
8.9 years ago

Dear All

I am trying to remove the sequences with no functional information from a functional annotation fasta file

>Oeu043104.1|---NA---
MIESNFWDACWPHCLLRVLLSLLAESASQPLCPPLQRYNPKYLEDDYGVNQATEWLFYTPRDRENEIENIRNGVAVDGY
>Oeu043107.1|g7lir3_medtr alpha beta-hydrolase superfamily protein os=medicago truncatula gn=mtr_8g086260 pe=4 sv=1
MSTTQGTSPRGNINVKDEPDHLLVLVHGIMGSPSDWTYFEADLKRRLGKRFLIYASSCNTYTKTFTGIDGAGKRLAEEVMEIVRNTESLKKISFLAHSLG
GLFSRYAIAVLYMPNTSSDDSSVIAGSTNTSLKTSCYSNTGLIAGLEPSNFITLATPHLGVRGKKQVNPFSIILIDGPVLPFLLGLPFLEKIAAPLAPIF
TGRTGSQLFLTDGQPDRPPLLLRMASDCKDGKFVSALGAFRCRLLYANVSYDHMVGWRTSSIRRETELIKPPLQSLDGYKHVVSVEYCPPVSSEGPHFPE
EAAKAKQAAQNEPNNQNTVEYHETMEEEMIRGLQRLGWKKVDVSFHSAFWPFFAHNNINVKNEWLYNAGVGVVAHVADNIKQQENQQGSTYVAASL

I was wondering if someone can help me with a python script or a shell script

Thank you
Oussama

fasta sequence • 3.6k views
ADD COMMENT
0
Entering edit mode

You could convert to single line fasta and use a simple grep to remove lines with --NA-- or use bio Python to have more control.

ADD REPLY
2
Entering edit mode
8.9 years ago
st.ph.n ★ 2.7k

Since you mentioned you wanted python:

Edit 'parse_fasta.py':

#!/usr/bin/env python

import sys

input_file = sys.argv[1]
output file = sys.argv[2]

with open(input_file, 'r') as f:
    headers = []
    seqs = []

    for line in f:
        if line.startswith(">"):
            headers.append(line.strip())
        else:
            seqs.append(line.strip())

myseqs = dict(zip(headers,seqs))

with open(outfile, 'w') as out: 
    for m in myseqs:
        if '---NA---' not in m:
            print >> out, m, '\n', myseqs[m]

Usage:

python parse_fasta.py input.fasta output.fasta
ADD COMMENT
0
Entering edit mode
8.9 years ago
novice ★ 1.1k

Too easy bro:

#!/usr/bin/perl

use strict; use warnings;

my $print;

while (<>) {
    $print = m/---NA---/ ? 0: 1 if m/>/;
    print if $print;
}

Usage: perl filter.pl sequences.fasta > filtered.fasta or ./filter.pl ... if you made the script executable.

Since I've got time on my hands and like Perl, I wrote you another script. This one would "slurp" the entire file into memory, so it's probably best to avoid if your file is huge.

#!/usr/bin/perl

use strict; use warnings;

my $whole = do { local $/; <> };
my @keep = map { s/\A([^>])/>$1/; $_ }
    grep { ! m/---NA---/; } split />/, $whole;
print @keep;
ADD COMMENT

Login before adding your answer.

Traffic: 2099 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6