GREP from multiFasta file and keep headers
1
0
Entering edit mode
3.2 years ago

Hi there,

I am new to coding, consider yourself warned :D

I have a multifasta file with 3' UTR sequences of variable length. I would like to extract a 6-mer sequence; AGTCTC with 20 nts upstream and 20 nts downstream (but not the rest of the sequence from that 3'UTR or that particular line, just these 20+6+20 nucleotides). I know I can do that with Grep;

grep -   "....................AGTCTC...................." 3UTR.fa  > 3UTR2.fa

However, this doesn't give me a new Fasta file with headers.. I know I can use the -B option but this give me all the lines from header to my sequence of interest.

Any suggestions are much appreciated!

Cheers, Marie

GREP header Multifasta • 1.3k views
ADD COMMENT
0
Entering edit mode

your command line above with grep implies that each record in your fasta file takes only two lines (?). One line for the header, and one line for the sequence. So you just have to use the option -B 1 of grep (one line before)

ADD REPLY
0
Entering edit mode

Thanks Pierre. My sequence is not in a single line, so -B 1 won't do. Perhaps what I'm trying is not possible with grep

ADD REPLY
0
Entering edit mode

try using tools like seqkit which supports grep and multiline fasta.

$ seqkit -w 0 grep -srip .{20}AGTCTC.{20} input.fa -o output.fa
ADD REPLY
0
Entering edit mode
3.2 years ago

linerarize, grep, convert back to fasta

ADD COMMENT

Login before adding your answer.

Traffic: 1293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6