Hi Everyone, I currently have a scRNA-seq BAM file of 6 coral individuals mixed together with the genotype (SNP) of each coral. Coral individuals are labelled (i.e. sample IDs) 01, 02, 03, 04, 05, and 06, respectively. I merged the 6 genotype vcf files into one, and some of the results are as follows (I omitted the header starting with '##'):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 01 02 03 04 05 06
scahrs1_100 474321 . A G 35.64 PASS BaseQRankSum=1.981;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=5.09;ReadPosRankSum=-0.712;SOR=0.223;DP=7;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL ./.:.:.:.:. 0/1:5,2:7:43:43,0,122 ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:.
scahrs1_100 474331 . C G 35.64 PASS BaseQRankSum=1.981;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=5.09;ReadPosRankSum=-0.566;SOR=0.223;DP=7;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL ./.:.:.:.:. 0/1:5,2:7:43:43,0,122 ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:.
scahrs1_100 1349979 . T C 60.64 PASS BaseQRankSum=-4.543;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=2.76;ReadPosRankSum=0.033;SOR=1.085;DP=22;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. 0/1:9,13:22:30:68,0,30
scahrs1_100 1399140 . A C 63.64 PASS BaseQRankSum=0;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=7.95;ReadPosRankSum=-1.611;SOR=1.179;DP=8;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. 0/1:3,5:8:33:71,0,33
scahrs1_100 1742935 . G T 30.64 PASS BaseQRankSum=4.912;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=1.18;ReadPosRankSum=0.363;SOR=0.446;DP=26;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. 0/1:15,11:26:38:38,0,76
scahrs1_100 1945135 . C T 98.64 PASS BaseQRankSum=4.216;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=4.48;ReadPosRankSum=-0.1;SOR=1.214;DP=24;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL 0/1:13,9:22:99:106,0,182 ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:.
scahrs1_100 1945147 . C T 98.64 PASS BaseQRankSum=4.216;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=4.48;ReadPosRankSum=0.735;SOR=1.214;DP=22;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL 0/1:13,9:22:99:106,0,182 ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:.
scahrs1_100 7089066 . C A 67.64 PASS BaseQRankSum=0.674;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=13.53;ReadPosRankSum=1.036;SOR=0.446;DP=6;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL 0/1:3,2:5:75:75,0,102 ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:.
scahrs1_100 7089067 . G A 67.64 PASS BaseQRankSum=1.383;ExcessHet=0;FS=0;MQ=60;MQRankSum=0;QD=13.53;ReadPosRankSum=0.674;SOR=0.446;DP=6;AF=0.5;MLEAC=1;MLEAF=0.5;AN=2;AC=1 GT:AD:DP:GQ:PL 0/1:3,2:5:75:75,0,102 ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:.
When I try to demultiplex using demuxlet, any TWO sample IDs (e.g. 01 & 02, or 02 & 03) can run the pipeline completely, and the results are as expected. For example:
demuxlet --sam ~/tmp.storage/Sca_GEX_1_2023-9.bam --vcf ~/tmp.storage/all.coral.ID.vcf --sm 01 --sm 02 --out ~/demuxlet.result.di/Sca_GEX_1.demuxlet --group-list ~/tmp.storage/Group1_0min.tsv --field GT
will output 3 files:
And the result summary of samples 01 & 02 shows that many single cells can be identified (below, 318 cells and 976 cells). This indicates that the source file and pipeline run well.
(Demultiplex) u67@hoff:~$ singularity exec Demuxafy.sif bash Demuxlet_summary.sh ~/demuxlet.result.di/Sca_GEX_1.demuxlet.best
Classification Assignment N
AMB-01-02-01/02 511
AMB-01-02-02/01 47
AMB-02-01-01/02 558
AMB-02-01-02/01 85
DBL-01-02-0.500 2
DBL-02-01-0.500 1
SNG-01 318
SNG-02 976
But whenever I expand the sample ID to Three or more, the following error occurs:
(Demultiplex) u67@hoff:~$ demuxlet --sam ~/tmp.storage/Sca_GEX_1_2023-9.bam --vcf ~/tmp.storage/all.coral.ID.vcf --sm 01 --sm 02 --sm 03 --out ~/demuxlet.result.di/Sca_GEX_1.demuxlet --group-list ~/tmp.storage/Group1_0min.tsv --field GT
Available Options
The following parameters are available. Ones with "[]" are in effect:
Options for input SAM/BAM/CRAM : --sam [/mnt/data/dayhoff/home/u6798856/tmp.storage/Sca_GEX_1_2023-9.bam],
--tag-group [CB], --tag-UMI [UB]
Options for input VCF/BCF : --vcf [/mnt/data/dayhoff/home/u6798856/tmp.storage/all.coral.ID.vcf],
--field [GT], --geno-error [0.01],
--min-mac [1], --min-callrate [0.50],
--sm [01, 02, 03], --sm-list
Output Options : --out [/mnt/data/dayhoff/home/u6798856/demuxlet.result.di/Sca_GEX_1.demuxlet],
--alpha, --write-pair,
--doublet-prior [0.50],
--sam-verbose [1000000],
--vcf-verbose [10000]
Read filtering Options : --cap-BQ [40], --min-BQ [13],
--min-MQ [20], --min-TD,
--excl-flag [3844]
Cell/droplet filtering options : --group-list [/mnt/data/dayhoff/home/u6798856/tmp.storage/Group1_0min.tsv],
--min-total, --min-uniq, --min-snp
Run with --help for more detailed help messages of each argument.
NOTICE [2024/08/21 18:40:50] - Finished loading 2499 droplet/cell barcodes to consider
NOTICE [2024/08/21 18:40:50] - Finished identifying 3 samples to load from VCF/BCF
FATAL ERROR -
[E:int32_t main(int32_t, char**) Cannot read any single variant from /mnt/data/dayhoff/home/u6798856/tmp.storage/all.coral.ID.vcf]
terminate called after throwing an instance of 'pException'
what(): Exception was thrown
Aborted (core dumped)
The error says that it cannot read any variant in the vcf file. I also tried replacing --sm
with --sm-list
: any two sample IDs work, but more than two will produce the same error as above.
The package's sample dataset, which has 13 sample IDs, works fine on my server. I don't understand why this error occurs in my dataset and how to avoid it. If you know where the problem is, please let me know. Thanks.