Hello everyone,
I am trying to write a bash script to extract information from a pairwise alignment text file output (Emboss Needle). In the txt output, there is a header as follow:
#=======================================
#
# Aligned_sequences: 2
# 1: RC0505_OB1
# 2: RC0505_OB2
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 866
# Identity: 804/866 (92.8%)
# Similarity: 804/866 (92.8%)
# Gaps: 53/866 ( 6.1%)
# Score: 3881.5
#
#
#=======================================
I want to create a table with only one line such as i get the sequence ID, length, identity and gaps info.
So in this example, I basically want to create a table like this:
RC0505 866 804 53
I manage to extract the full line containing the Identity info for instance, using sed:
sed -n '/Identity/p' aln.txt
But then I can't figure out how to extract just the "804" part and put it in a table. And do this for the other parameters. I think that grep sed and awk functions could help but I don't know how to use them efficiently.
Any idea ?
Many thanks
Thank you very much! It also works well! Now I see better how to use the grep function coupled with cut. Have a nice day!
cut
,grep
,sed
,awk
,sort
,uniq
,cat
,xargs
,join
.. (and more besides).. all super useful for some quick and dirty chopping up of text :)Yes, I'll study them a lot, I guess I coud save so much time with them. Anyway, thanks for your coment, it's my first step toward bash script and it helped me understand the way it works!