Entering edit mode
12 months ago
sata72
•
0
I have several output files and want extract some information from them.
sample1.out
sample2.out
sample3.out
..
..
These information are there in each samples:
3259390 reads; of these:
3234126 (99.22%) were paired; of these:
----
292091 pairs aligned concordantly 0 times; of these:
----
59571 pairs aligned 0 times concordantly or discordantly; of these:
98.90% overall alignment rate
I want to have information of all the samples as this in the output:
sample1 sample1 sample1 sample 2 sample 2 sample 2
reads paired overall.alignment reads paired overall.alignment
3259390 3234126 98.90 3259398 3234136 98.98
You're going to have to write your own code for this, especially since your desired output format is not a straightforward one-row-per-sample table either.
If the line number is consistent and you need the first word from the 1st, 3rd 8th lines, you should just use awk to print the first word for each file where the NR matches one of those three numbers, transpose the output so entries are separated by tabs instead of new lines and generate the header manually.
Looks like this is
bowtie*
output? You may be able to runmultiQC
on them to summarize.For the sake of the future reports when you get 100 or maybe 1000 samples: flip the columns and rows and never mix numerical values with "reads" "paired" etc in a column.
Sample names as row names, the specific values as columns.
It is trivial to sort by column values, it is way less so if you want to sort values in a row but from selected columns. No need to use a shotgun for a foot self amputation me thinks.
Good idea! thanks