Hi all,
I subsampled Illumina fastqs twice using 2 different random seeds. I then mapped all of the reads and got a total number of mapped reads and a mapped percent.
For the same sample, I am trying to select the higher map%, as I have found for the vast majority of my samples seqtk underestimates the actual map%.
The format of my tsv file is as follows:
sample1 200000 120000 60%
sample1 200000 115000 57.5%
sample2 200000 180000 90%
sample2 200000 190000 95%
...
sampleX 200000 180000 90%
sampleX 200000 182000 91%
I want to iterate over column 1 in the file, and select the line for data in column 3 or 4 (it doesn't matter) that is higher. So my example output from the above would be:
sample1 200000 120000 60%
sample2 200000 190000 95%
...
sampleX 200000 182000 91%
Looking forward to hearing your thoughts! Thanks
worked like a charm! Thanks for the help.
You're very welcome!