Awk or Shell script in need
4
0
Entering edit mode
8.1 years ago
ThulasiS ▴ 90

Dear Forum Members I have a job to finish. I know it can be done with awk program but I don't have much programming skills. I am still learning awk The job is to extract some lines in a series from a file I have the following e.g. input file blast output

NC_007622|123-456 NC_234 123 568
NC_007622|123-456 NC_546 126 563
NC_007622|123-456 NC_564 582 369
NC_007622|123-456 NC_985 548 367
NC_007622|123-456 NC_758 877 687
NC_007622|841-898 NC_234 456 785
NC_007622|841-898 NC_546 458 798

Required output

NC_007622|123-456
NC_234 123 568
NC_546 126 563
NC_564 582 369
NC_007622|841-898
NC_234 456 785

I need every 7th element of column 1 followed by each line of column 2,3, 4.. Like this till end of file

Any help badly needed Thank you

shell awk • 2.8k views
ADD COMMENT
1
Entering edit mode

I am not giving you the exact answer. Instead I'm directing you to a resource. Just to let you know these problems can also be solved with google. Happy googling :)

How to print every nth line in a file in Linux?

or

extract every nth line from text file unix

ADD REPLY
0
Entering edit mode

I tried all the possible ways with googling. Stii I couldn't able to write exact script for my problem. Then I posted here.

Thank you

ADD REPLY
0
Entering edit mode

The question is not clear as you mixed the example with your explanation. Also, what do you mean by 7th element and how does the actual file look like. Awk and cut can be used for column-wise extraction, @venu has already given you the route

ADD REPLY
0
Entering edit mode

Before posting my input and ouput looks normally like in my file. But after posting it became unclear. Simply, I can explain Suppose input looks like this 1| 25| 368| 398 1| 26| 368| 375 1| 27| 367| 398 1|| 29| 398 347 2| 25 |754 982 what output I need is 1| 25| 368| 398 26| 368 375 27 |367| 398 29| 398| 34 7 2| 25| 754| 982 and so on..

"|" represents different row

ADD REPLY
0
Entering edit mode

why not just any programming langue like python or perl ?

ADD REPLY
0
Entering edit mode

I modified your question for readability.

It's good practice to show what you tried and what didn't work.

ADD REPLY
0
Entering edit mode

What I tried is something naive like this awk

'BEGIN {FS=OFS== " "} { 'NR%7==7{ print $1}'}' | awk 'NR%1==1{print $2,$3,$5}' It is printing all required items from column 1 but that is not i required

ADD REPLY
0
Entering edit mode

You can do it with basic cut and sed - something like below where you replace delimiters and columns "ab". This is no means to test you, but basic scripting questions can be checked at stackoverflow.

cat <(cut -d'space' -fa file | sort -u) <(sed 's/space/tab/' | cut -d'tab' -fb)

It would be nice to show us what you tried and what didn't work while posting the question.

ADD REPLY
2
Entering edit mode
8.1 years ago
nterhoeven ▴ 120

I would use the following perl one-liner for this:

perl -ane 'BEGIN{$id="";} if($F[0] ne $id){$id=shift(@F); print $id,"\n",join(" ",@F),"\n";}else{shift(@F); print join(" ",@F),"\n";}' filename.txt

Explanation:

  • The file is read line-wise and each line is split at whitespace
  • The first column is checked (is it the same than before?)
  • if yes, the 2nd, 3rd and 4th columns are printed
  • if no, the 1st column is printed and stored, then the rest is printed in a new line
ADD COMMENT
1
Entering edit mode

a little bit simpler:

perl -lane '$h1 = shift @F; $h1 ne $h2 and print $h1; print "@F"; $h2 = $h1' filename.txt
ADD REPLY
0
Entering edit mode

even simpler:

perl -ape 's/ /\n/; $h and s/\Q$h\E\n//; $h = $F[0]' filename.txt

just learnt that \Q and \E can be used to tell regex to treat a variable as a literal string (the | present in the titles is a regex special character). very convenient if you don' t want to parse your variables when using them inside regex functions.

ADD REPLY
0
Entering edit mode

Thank you so much nterhoeven The job done in jiffy..

ADD REPLY
2
Entering edit mode
8.1 years ago

AWK has arrays for storing groups of related strings or numbers. Just use it this way :

awk '{tab[$1]=tab[$1]"\n"$2" "$3" "$4} END {for (i in tab) {print i " " tab[i]} }' test.txt

For each identifier in column $1 create an entry in the array (tab) if absent or concatenate its content to columns 2 to 4. Recall that adding "\n" to the concatenated string help writing the output in different lines.

ADD COMMENT
1
Entering edit mode
8.1 years ago

It is simple in awk:

awk '{print $1, $2, $3, $4, $6, $7, $8, $10, $11, $12}' input_file>output_file

Answer is valid only for the data provided initially like

NC_007622|123-456 NC_234 123 568 NC_007622|123-456 NC_546 126 563 NC_007622|123-456 NC_564 582 369

than output will be:-->

NC_007622|123-456 NC_234 123 568 NC_546 126 563 NC_564 582 369

ADD COMMENT
2
Entering edit mode

Based on the posts of other people here I have the impression you are oversimplifying things and your code won't yield the desired result.

ADD REPLY
0
Entering edit mode
8.1 years ago
5heikki 11k

Something like this. Perhaps your field separator is something other than space though? Also the columns after the else..

awk 'BEGIN{FS=" "}{if(NR==1 || !(NR%7)){print $1}else{print $2,$3,$4}}' file.txt
ADD COMMENT
0
Entering edit mode

For future ref: This command currently produces following output using example in original post.

NC_007622|123-456
NC_546 126 563
NC_564 582 369
NC_985 548 367
NC_758 877 687
NC_234 456 785
NC_007622|841-898
ADD REPLY

Login before adding your answer.

Traffic: 2669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6