How to remove some fasta sequences by header information from a large fasta file, any command and script please?
2
1
Entering edit mode
9.5 years ago
seta ★ 1.9k

Dear all,

I would like to remove some fasta sequences within a large fasta file in term of header information (sequence name), could anybody please help me out to this end? Thanks so much in advance

The header information is like here:

> contig1
ATGCGTACGTCATG
>contig2
GCTACGTCCCA
alignment RNA-Seq blast next-gen • 9.2k views
ADD COMMENT
4
Entering edit mode
9.5 years ago

The BBMap package contains a tool called FilterByName which can do this:

filterbyname.sh in=file.fa out=filtered.fa names=contig1,contig2

It supports prefixes, case-sensitive or insensitive matching, inclusion or exclusion, and substring matching. Rather than a list of names, you can instead point "names=" to another fasta or fastq file.

ADD COMMENT
0
Entering edit mode

Thanks Brian. I'm trying to use your tool to this end, but I faced with the following error. Could you please let me know what is wrong and how to solve it?

Exception in thread "main" java.lang.AssertionError: Unknown parameter names.txt
        at driver.FilterReadsByName.<init>(FilterReadsByName.java:118)
        at driver.FilterReadsByName.main(FilterReadsByName.java:41)

Many thanks

ADD REPLY
1
Entering edit mode

What is wrong:

Unknown parameter names.txt

How to solve it:

Execute command as Brian suggested (using parameter names=).

ADD REPLY
4
Entering edit mode
9.5 years ago
h.mon 35k

There are several threads with similar questions, just enjoy the multitude of answers and choose the one most suited for you, see here, or here, or here.

ADD COMMENT

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6