grep value from html file
1
1
Entering edit mode
18 months ago
arshad1292 ▴ 110

I have 200 html files that contain information such as Filename, Filetype, total Sequences etc. Please see attached the screenshot enter image description here

I need to grep the Filename and Total Sequences from the Value column (in this screenshot I need IGM17-B_S162_read_1.fastq and the value 9237623) and save it in a seperate.txt file.

May be with grep or cat command. Again, these are html files.

I would really appreciate help from anyone who's expert in writing the script in the command line.

cat script commandline shell grep • 898 views
ADD COMMENT
1
Entering edit mode

This can be done, but it seems that you wish to aggregate FastQC reports and possibly other logfiles. So maybe you want to try MultiQC first before trying to come up with an own solution?

ADD REPLY
4
Entering edit mode
18 months ago

html produced by fastqc in XML+HTML, so you can use a XPATH expression to extract things.

$ xmllint --xpath '//tr[td[1]/text()="Filename"]/td[2]/text()'   fastqc_report.html
jeter.fastq.gz

 xmllint --xpath '//tr[td[1]/text()="Total Sequences"]/td[2]/text()'  fastqc_report.html
147142898

fastqc also comes with a text file fastqc_data.txt

$ grep -E '(Filename|Total Sequences)'  fastqc_data.txt 
Filename    jeter.fastq.gz
Total Sequences 147142898
ADD COMMENT

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6