The awk solution is more general, but for simplicity, if I have single-line rather than hard-wrapped FASTAs, I prefer to do this with grep -A
and tail
.
grep -wA 1 '>NODE_19_length_5758_cluster_19_candidate_1' example.fasta | tail -n 1
The -A 1
tells grep to return both the matched line and the line immediately after it, and then tail
takes the last (second) line of that result. (The w
is just for full-word matching in case you have similar sequence names that are subsets of each other.)
If you're sure that the entire sequence is on the next line, I find grep is easier to use in a parameterized loop than the awk version. I always get tangled up with the quoting. In fact, with the grep approach, you can even use a file to hold your list of desired headers. If you want both the headers and their sequences, just drop the second grep
.
grep -wf list-of-headers.txt -A 1 example.fasta | grep -v '^>'
If your sequences are spread across multiple lines, then use one of the awk solutions, or "unwrap" your FASTA first with a tool like fastx_toolkit.
for printing sequence only: