I have a around 50 files which are named in the format: ERR*.log (i.e. ERR23432.log, ERR12356.log, and so on...). From each file I want to extract a specific information (value).
Within each file, there are values at the end of the lines: final pair1 : Total reads after merging results from multiple database...
and final pair2 : Total reads after merging results from multiple databases...
You can see these lines in the 62nd and 63rd line of the Link to GoogleDrive log file file, also shown below:
09/06/2020 09:51:45 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /folder/directory/Desktop/srr00823_ob/kneaddata_output/ERR260136_1_kneaddata_paired_1.fastq ): 12818370.0
09/06/2020 09:51:52 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /folder/directory/Desktop/srr00823_ob/kneaddata_output/ERR260136_1_kneaddata_paired_2.fastq ): 12818370.0
Now, I want a script that will extract these values and add them to get a single value for each file. And then, it will give an output file with the extracted information where the first column will be the name of the file without the extension (i.e. ERR45666 in the attached example) and the second column with the added value.
Can anyone please help me out?
Here is the head
of my example log file:
09/06/2020 09:35:12 PM - kneaddata.knead_data - INFO: Running kneaddata v0.7.10
09/06/2020 09:35:12 PM - kneaddata.knead_data - INFO: Output files will be written to: /folder/directory/Desktop/srr00823_ob/kneaddata_output
09/06/2020 09:35:12 PM - kneaddata.knead_data - DEBUG: Running with the following arguments:
verbose = False
input = /folder/directory/Desktop/srr00823_ob/ERR260136_1.fastq /folder/directory/Desktop/srr00823_ob/ERR260136_2.fastq
output_dir = /folder/directory/Desktop/srr00823_ob/kneaddata_output
reference_db = /home/deepchandaaws/kneaddata_db/hg37dec_v0.1
bypass_trim = True
output_prefix = ERR260136_1_kneaddata
threads = 8
Thanks sir. But, to my knowledge it will just print the all lines containing the term "final pair".
What is your expected output?
OP wants
ERR260136 12818370.0
.