Entering edit mode
3.2 years ago
marie.lorans
•
0
Hi there,
I am new to coding, consider yourself warned :D
I have a multifasta file with 3' UTR sequences of variable length. I would like to extract a 6-mer sequence; AGTCTC with 20 nts upstream and 20 nts downstream (but not the rest of the sequence from that 3'UTR or that particular line, just these 20+6+20 nucleotides). I know I can do that with Grep;
grep - "....................AGTCTC...................." 3UTR.fa > 3UTR2.fa
However, this doesn't give me a new Fasta file with headers.. I know I can use the -B option but this give me all the lines from header to my sequence of interest.
Any suggestions are much appreciated!
Cheers, Marie
your command line above with
grep
implies that each record in your fasta file takes only two lines (?). One line for the header, and one line for the sequence. So you just have to use the option-B 1
of grep (one line before)Thanks Pierre. My sequence is not in a single line, so -B 1 won't do. Perhaps what I'm trying is not possible with grep
try using tools like
seqkit
which supportsgrep
andmultiline
fasta.