Extract out fasta sequences using sequence headers.
1
1
Entering edit mode
7.6 years ago
a.rex ▴ 350

I usually extract out fasta sequences using samtools:

i.e to extract the sequences for gene 000001

samtools faidx /path/to/transcriptome 000001

However, I was wondering whether there was a better method for extracting isoform sequences. I have tried the following command, but to no avail, to extract the sequences for gene isoforms 000001.1, 000001.2, 000001.3:

samtools faidx /path/to/transcriptome 000001.*

Does anyone have any tips on how to do this effectively?

gene • 1.7k views
ADD COMMENT
1
Entering edit mode

BBMap's filterbyname tool will work like this:

filterbyname.sh in=transcriptome.fa out=filtered.fa include names=000001. substring=name
ADD REPLY
1
Entering edit mode
7.6 years ago
Jake Warner ▴ 840

You can do this with Awk :

awk '/'000001.*'/{flag=1;print $0;next}/^>/{flag=0}flag' file.fasta >> outfile.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6