Question

Extracting specific sequences from a big fasta file using ids of the sequences to be excluded

0

Entering edit mode

9.2 years ago

hasche89 • 0

I have a huge fasta file of around 20 GB size. I also have some sequence IDS from the same fasta file in text format. Now, I want to retrieve those sequences which don't have those particular ids in the text file.

How shall I proceed? I use Ubuntu 12. I am a novice and have very little knowledge of bash, shell or perl. Any Linux or Samtools or Bioperl command will be helpful.

Thanks.

RNA-Seq samtools faidx bioperl perl • 5.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by hasche89 • 0

score 2 · Answer 1 · 2015-09-26

2

Entering edit mode

9.2 years ago

thackl ★ 3.0k

This would work:

git clone https://github.com/BioInf-Wuerzburg/SeqFilter.git
cd SeqFilter
make  # just fetches some libraries, no root or anything required

bin/SeqFilter big.fasta --ids idx.txt --ids-exclude --out big-filtered.fasta

ADD COMMENT • link 9.2 years ago by thackl ★ 3.0k

Ram · Answer 2 · 2015-09-26

1

Entering edit mode

9.2 years ago

GouthamAtla 12k

Simple way is to get a list of IDs that you would like to fetch from fasta. This could be done with 'grep' .

grep "^>" input.fasta | sed 's/>//' | grep -v - -f Ids.txt > retreive_IDs.txt

Then you could use something like pyfaidx or samtools

samtools faidx input.fasta `cat retreive_IDs.txt`

ADD COMMENT • link 9.2 years ago by GouthamAtla 12k

0

Entering edit mode

and also faSomeRecords

./faSomeRecords input.fa retreive_IDs.txt output.fa

ADD REPLY • link 9.2 years ago by venu 7.1k

0

Entering edit mode

Thanks for the commands.

I am a beginner in this field. Can you please tell me what does each component of your command does?

Thanks

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by hasche89 • 0

0

Entering edit mode

Execute each command on your own, then you will understand very easily what each command is doing.

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by venu 7.1k

Ram · Answer 3 · 2015-09-26

1

Entering edit mode

9.2 years ago

Brian Bushnell 20k

Boy, this really comes up a lot. Using the BBMap package:

filterbyname.sh in=file.fasta out=filtered.fasta names=names.txt include=f

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by Brian Bushnell 20k

0

Entering edit mode

Always important to keep busy ;)

ADD REPLY • link 9.2 years ago by thackl ★ 3.0k