Sequence extraction
3
0
Entering edit mode
2.9 years ago
zhichusun ▴ 10

Hello, I have a fasta file that contains sequences of different lengths. I want to extract the base sequences greater than 500 and less than 10000bp and regenerate a fasta file. What should I do? Thanks a lot if anyone can help.

extraction Sequence • 1.2k views
ADD COMMENT
3
Entering edit mode
2.9 years ago
Mensur Dlakic ★ 28k

One of the ways to do it is with seqkit:

seqkit seq -M 10000 -m 500 file.fas > new_file.fas
ADD COMMENT
0
Entering edit mode

wa, that's a good way.

ADD REPLY
1
Entering edit mode
2.9 years ago
$ bioawk -c fastx '{ml=500;ML=10000;print (length($seq)>ml && length($seq)<ML)? (">"$name"\n"$seq) :""}' test.fna
$ cutadapt --quiet -m 500 -M 10000 test.fna
ADD COMMENT
0
Entering edit mode
2.9 years ago
GenoMax 147k

Using BBMap suite:

reformat.sh in=input.fa out=filterd.fa minlength=500 maxlength=10000
ADD COMMENT

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6