How to extract out specific information from files within a directory?
1
1
Entering edit mode
4.2 years ago

I have a around 50 files which are named in the format: ERR*.log (i.e. ERR23432.log, ERR12356.log, and so on...). From each file I want to extract a specific information (value).

Within each file, there are values at the end of the lines: final pair1 : Total reads after merging results from multiple database... and final pair2 : Total reads after merging results from multiple databases... You can see these lines in the 62nd and 63rd line of the Link to GoogleDrive log file file, also shown below:

09/06/2020 09:51:45 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /folder/directory/Desktop/srr00823_ob/kneaddata_output/ERR260136_1_kneaddata_paired_1.fastq ): 12818370.0
09/06/2020 09:51:52 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /folder/directory/Desktop/srr00823_ob/kneaddata_output/ERR260136_1_kneaddata_paired_2.fastq ): 12818370.0

Now, I want a script that will extract these values and add them to get a single value for each file. And then, it will give an output file with the extracted information where the first column will be the name of the file without the extension (i.e. ERR45666 in the attached example) and the second column with the added value.

Can anyone please help me out?

Here is the head of my example log file:

09/06/2020 09:35:12 PM - kneaddata.knead_data - INFO: Running kneaddata v0.7.10
09/06/2020 09:35:12 PM - kneaddata.knead_data - INFO: Output files will be written to: /folder/directory/Desktop/srr00823_ob/kneaddata_output
09/06/2020 09:35:12 PM - kneaddata.knead_data - DEBUG: Running with the following arguments: 
verbose = False
input = /folder/directory/Desktop/srr00823_ob/ERR260136_1.fastq /folder/directory/Desktop/srr00823_ob/ERR260136_2.fastq
output_dir = /folder/directory/Desktop/srr00823_ob/kneaddata_output
reference_db = /home/deepchandaaws/kneaddata_db/hg37dec_v0.1
bypass_trim = True
output_prefix = ERR260136_1_kneaddata
threads = 8
python grep regex • 1.1k views
ADD COMMENT
0
Entering edit mode
4.2 years ago
zx8754 12k

grep should work, something like:

grep 'final pair' ERR*.log
ADD COMMENT
0
Entering edit mode

Thanks sir. But, to my knowledge it will just print the all lines containing the term "final pair".

ADD REPLY
0
Entering edit mode

What is your expected output?

ADD REPLY
0
Entering edit mode

OP wants ERR260136 12818370.0.

ADD REPLY

Login before adding your answer.

Traffic: 2096 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6