Entering edit mode
22 months ago
kani
▴
10
Hi, I would like to split a yaml file in to multiple files based on a key (using 'yq' or other method), the file names should be written as key name. The map arrays are not in order! Appreciate any help! Thanks
Here's my input file test.yaml
samples:
WHH525:
- 3c657174
- 5e3d9b37
WHH527:
- 33b5d3ed
- 81f663d4
WHH528:
- 93f3bda0
- befe7e0e
readunits:
3c657174:
flowcell_id: null
fq1: WHH525/WHH525_HS006-PE-R00098_L002_R1.fastq.gz
fq2: WHH525/WHH525_HS006-PE-R00098_L002_R2.fastq.gz
lane_id: '2'
library_id: WHH525
rg_id: HS006-PE-R00098.2
run_id: HS006-PE-R00098
81f663d4:
flowcell_id: null
fq1: WHH527/WHH527_HS007-PE-R00094_L002_R1.fastq.gz
fq2: WHH527/WHH527_HS007-PE-R00094_L002_R2.fastq.gz
lane_id: '2'
library_id: WHH527
rg_id: HS007-PE-R00094.2
run_id: HS007-PE-R00094
93f3bda0:
flowcell_id: null
fq1: WHH528/WHH528_HS006-PE-R00100_L008_R1.fastq.gz
fq2: WHH528/WHH528_HS006-PE-R00100_L008_R2.fastq.gz
lane_id: '8'
library_id: WHH528
rg_id: HS006-PE-R00100.8
run_id: HS006-PE-R00100
5e3d9b37:
flowcell_id: null
fq1: WHH525/WHH525_HS006-PE-R00021_L001_R1.fastq.gz
fq2: WHH525/WHH525_HS006-PE-R00021_L001_R2.fastq.gz
lane_id: '1'
library_id: WHH525
rg_id: HS006-PE-R00021.1
run_id: HS006-PE-R00021
33b5d3ed:
flowcell_id: null
fq1: WHH527/WHH527_HS006-PE-R00097_L004_R1.fastq.gz
fq2: WHH527/WHH527_HS006-PE-R00097_L004_R2.fastq.gz
lane_id: '4'
library_id: WHH527
rg_id: HS006-PE-R00097.4
run_id: HS006-PE-R00097
befe7e0e:
flowcell_id: null
fq1: WHH528/WHH528_HS006-PE-R00098_L002_R1.fastq.gz
fq2: WHH528/WHH528_HS006-PE-R00098_L002_R2.fastq.gz
lane_id: '2'
library_id: WHH528
rg_id: HS006-PE-R00098.2
run_id: HS006-PE-R00098
My expected output file names WHH525.yaml
, WHH527.yaml
and WHH528.yaml
and the outputs are...
WHH525.yaml
samples:
WHH525:
- 3c657174
- 5e3d9b37
readunits:
3c657174:
flowcell_id: null
fq1: WHH525/WHH525_HS006-PE-R00098_L002_R1.fastq.gz
fq2: WHH525/WHH525_HS006-PE-R00098_L002_R2.fastq.gz
lane_id: '2'
library_id: WHH525
rg_id: HS006-PE-R00098.2
run_id: HS006-PE-R00098
5e3d9b37:
flowcell_id: null
fq1: WHH525/WHH525_HS006-PE-R00021_L001_R1.fastq.gz
fq2: WHH525/WHH525_HS006-PE-R00021_L001_R2.fastq.gz
lane_id: '1'
library_id: WHH525
rg_id: HS006-PE-R00021.1
run_id: HS006-PE-R00021
WHH527.yaml
sample:
WHH527:
- 33b5d3ed
- 81f663d4
readunits:
33b5d3ed:
flowcell_id: null
fq1: WHH527/WHH527_HS006-PE-R00097_L004_R1.fastq.gz
fq2: WHH527/WHH527_HS006-PE-R00097_L004_R2.fastq.gz
lane_id: '4'
library_id: WHH527
rg_id: HS006-PE-R00097.4
run_id: HS006-PE-R00097
81f663d4:
flowcell_id: null
fq1: WHH527/WHH527_HS007-PE-R00094_L002_R1.fastq.gz
fq2: WHH527/WHH527_HS007-PE-R00094_L002_R2.fastq.gz
lane_id: '2'
library_id: WHH527
rg_id: HS007-PE-R00094.2
run_id: HS007-PE-R00094
WHH528.yaml
samples:
WHH528:
- 93f3bda0
- befe7e0e
93f3bda0:
flowcell_id: null
fq1: WHH528/WHH528_HS006-PE-R00100_L008_R1.fastq.gz
fq2: WHH528/WHH528_HS006-PE-R00100_L008_R2.fastq.gz
lane_id: '8'
library_id: WHH528
rg_id: HS006-PE-R00100.8
run_id: HS006-PE-R00100
befe7e0e:
flowcell_id: null
fq1: WHH528/WHH528_HS006-PE-R00098_L002_R1.fastq.gz
fq2: WHH528/WHH528_HS006-PE-R00098_L002_R2.fastq.gz
lane_id: '2'
library_id: WHH528
rg_id: HS006-PE-R00098.2
run_id: HS006-PE-R00098
use a Yaml api. https://pyyaml.org/wiki/PyYAMLDocumentation
Thanks Pierre Lindenbaum