Error extracting faa sequences from multifasta using faidx
1
0
Entering edit mode
9.0 years ago
dago ★ 2.8k

I have a multifasta file from which I want to extract some sequences using IDs.

Multifasta example:

>NITMOv2_RS22300
MAKTVAVVIREDPRRTHRPVEALRIALGLVAGNHATTVVLLNEAARLLSEDTDDVVDVEI
LEKYLPSIQQLEVPFVLPEFIDRSGVRTDFAVRYESDDTIRRLLQSMDRTLVF
>NITMOv2_RS22305
MSLSSSVYLIRKSAAALSPTLYVSGDSDWVVVEIGEDKRSSDYRELLELVLHAEKVITL

IDs example:

NITINOP_v2_3300
NITINOP_v2_3307

I usually do this using the following command

xargs faidx -d "" MULTIFASTA < IDs

It always worked fine, but with some new files it started to give me the following error I cannot understand:

Traceback (most recent call last):
  File "/usr/local/bin/faidx", line 9, in <module>
    load_entry_point('pyfaidx==0.3.4', 'console_scripts', 'faidx')()
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 132, in main
    write_sequence(args)
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 33, in write_sequence
    fasta = Fasta(args.fasta, default_seq=args.default_seq, strict_bounds=not args.lazy, split_char=args.delimiter)
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/__init__.py", line 527, in __init__
    read_ahead=read_ahead, mutable=mutable, split_char=split_char)
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/__init__.py", line 218, in __init__
    raise FastaIndexingError(e)

Any suggestion, what am I missing here?

genome software-error • 2.4k views
ADD COMMENT
1
Entering edit mode

Alternatively you can use faSomeRecords,

./faSomeRecords input.faa ids.txt output.faa

Here is how A: perl code to extract sequences from multi-line fasta works on all test files but

ADD REPLY
0
Entering edit mode

Thanks very much. This is another good option I guess.

However, whenever I try to run it I get the following error:

Unrecognized character \x7F; marked by <-- HERE after <-- HERE near column 1 at faSomeRecords line 1.

I try to look into the code but I cannot even open it. Any suggestion?

EDIT

I did not make it executable...sorry! Thanks it worked!

ADD REPLY
0
Entering edit mode
9.0 years ago

Try with proper IDs.

NITINOP_v2_3300
NITINOP_v2_3307

Both are not present in your example and it works fine if I use NITMOv2_RS22300. So the problem might be with the match between Ids between fasta and Ids.txt

ADD COMMENT
0
Entering edit mode

Thanks. I just reported an example. I checked and my Ids are present in the multifasta I am using

ADD REPLY

Login before adding your answer.

Traffic: 1529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6