Separate lines based on some character?
1
1
Entering edit mode
5.8 years ago
star ▴ 350

I have a fasta file like below, I like to seprate only coordinate and add one column with "hello" name.

>Human|chr16:80372593-80373755 | element 4 | positive  | neural tube[6/10] | hindbrain (rhombencephalon)[10/10] | midbrain (mesencephalon)[10/10]
gtgaCAGAGACAGACAGTGACAGAGACAgattttagaatttgaacaaaggtaaataagag

>Human|chr16:78510608-78511944 | element 12 | positive  | hindbrain (rhombencephalon)[9/11] | forebrain[9/11]
AAGCTAGCTAATTGCTTCTTCAGTTGaagacctaaatgagttttaaagtgaaatgcatat

Expect file:

chr16:80372593-80373755         hello
chr16:78510608-78511944         hello
R linux fasta • 1.4k views
ADD COMMENT
0
Entering edit mode

what do you consider 'name' (== which of the fields in your fasta header?)

ADD REPLY
0
Entering edit mode

I like to add my favorite name.

ADD REPLY
0
Entering edit mode

Will change for each entry or stay the same?

ADD REPLY
0
Entering edit mode

it is the same name.

ADD REPLY
4
Entering edit mode
5.8 years ago
awk -F '|' '/^>/ { print $2 "\thello"}' jeter.fa
ADD COMMENT
2
Entering edit mode

If you get annoyed by the extra space after chr16:80372593-80373755 :

awk -F '|' '/^>/ { print substr($2, 1, length($2)-1) "\thello"}' jeter.fa
ADD REPLY
0
Entering edit mode

I would just use -F '[| ]' ...

ADD REPLY
0
Entering edit mode

Yes it is quicker, but if they got a space in the id name the command fails

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6