Hello.
I wish to know if one can find the following information in CRAM files' headers:
1) Whether or not sequencing data in CRAM files is from WGS or WES, and if so, where?
and
2) In case one file can consist of data from multiple genomes (for instance, from multiple patients), can genomes information can be found in CRAM headers? such as in
@PG ID:bwa PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-517\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_517_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_517_1.00_2.fq.gz
@PG ID:bwa.1 PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-518\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_518_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_518_1.00_2.fq.gz
@PG ID:bwa.2 PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-519\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_519_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_519_1.00_2.fq.gz
@PG ID:bwa.3 PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-520\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_520_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_520_1.00_2.fq.gz
@PG ID:bwa.4 PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-521\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_521_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_521_1.00_2.fq.gz
@PG ID:bwa.5 PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-522\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_522_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_522_1.00_2.fq.gz
@PG ID:bwa.6 PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-523\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_523_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_523_1.00_2.fq.gz
@PG ID:bwa.7 PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 16 -M -Y -R @RG\tID:180213_I006_CL100063425_L1_PL1802120047-524\tPL:ILLUMINA\tPU:CL100063425_L1\tLB:PL1802120047\tSM:1000000047 Homo_sapiens_assembly38/Homo_sapiens_assembly38.fa /l3bioinfo/CL100063425_L01_524_1.00_1.fq.gz /l3bioinfo/CL100063425_L01_524_1.00_2.fq.gz
I'll provide more headers if needed.
Thanks.
Thanks @GenoMax.
Will providing relevant headers from a CRAM file allow giving a specific answer?
Thinking about this again, it may be safer to look for coverage rather than guess based on file names,.
Based on the @PG lines you included above the alignment appears to have been done to the entire genome but that does not tell us if the data is WGS or WES. Sample file names don't mean anything to us (unless they do to you).
I added these @PG lines in order to show that the CRAM file include results from 8 separate BWA runs, which made me think that it might mean that the file includes data for 8 separate genomes.
Only part that differs in the names is the
5**
number -->CL100063425_L01_**524**
. If you think that indicates a different sample then perhaps. Otherwise these may simply be data from multiple lanes for the same library.Otherwise no way to conclude from the information you have at hand.