When using Pisces, I get the following error.
System.Exception: RACP2-6poolv4_P5-A_FINAL_SORTED.bam: Error processing chr 'chr7': Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 ... 150S ---> System.Exception: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 ... 150S
at Pisces.Domain.Logic.CandidateVariantFinder.ProcessCigarOps(Read alignment, String refChromosome, Int32 readStartPosition, String chromosomeName)
at Pisces.Logic.SomaticVariantCaller.Execute()
at Pisces.Processing.Logic.BaseGenomeProcessor.ProcessByBam(BamWorkRequest workRequest, String chrName)
--- End of inner exception stack trace ---.
8/19/23 7:21 PM 1 Exception reported:
System.Exception: RACP2-6poolv4_P5-A_FINAL_SORTED.bam: Error processing chr 'chrX': Failed to process variants for MN01972:49:000H5KYMY:1:22101:7666:3361 ... 151S ---> System.Exception: Failed to process variants for MN01972:49:000H5KYMY:1:22101:7666:3361 ... 151S
at Pisces.Domain.Logic.CandidateVariantFinder.ProcessCigarOps(Read alignment, String refChromosome, Int32 readStartPosition, String chromosomeName)
at Pisces.Logic.SomaticVariantCaller.Execute()
at Pisces.Processing.Logic.BaseGenomeProcessor.ProcessByBam(BamWorkRequest workRequest, String chrName)
--- End of inner exception stack trace ---.
8/19/23 7:21 PM 1 Exception reported:
System.Exception: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 ... 150S
at Pisces.Processing.Logic.BaseProcessor.Execute(Int32 maxThreads)
at Pisces.Program.ProgramExecution()
at CommandLine.Application.BaseApplication`1.Execute().
I have investigated this with this
samtools view your_file.bam | grep "MN01972:49:000H5KYMY:1:11102:26356:2968"
and I got this
MN01972:49:000H5KYMY:1:22104:2296:19556 16 chr7 148817421 60 151S * 00 TTCAAATTTACTGCCAATGAAGAACCAAAAACATGTCGGCTCATCACCACTTCACAACACAATCCCAAGCCGTTGAAAAATTGGCCTCAACATTTATTGTGTTTGACAACGTTTAATTGGATCTACAAAACCAAATGTAAGCACTGGTCAA A//6///A/F///=FF/////F//////FF///F///F//F////F/F=////////FF///A///F6F//F/F/FF///FF/F//F/////FA//F/F///FF///A/F////F/F////F///FFFFFFFFFFF6FFF/FFFF/AAFFF RG:Z:230817_MN01972_0049_A000H5KYMY_RACP2-6poolv4_P5-BAS:i:30 XS:i:19
Here is the explanation:
MN01972:49:000H5KYMY:1:22104:2296:19556: read name
16: bitwise FLAG
chr7: reference chromosome
148817421: start position of the read on the reference chromosome
60: mapping quality
151S: CIGAR string
*: reference name of the mate/next read
0: position of the mate/next read
0: observed template length
TTCAAATTT...GTCAA: sequence
A//6///A.../AAFFF: quality
The issue might be with the CIGAR string, which is 151S. The CIGAR string is used to represent alignment characteristics. The S in the CIGAR string represents a soft-clipped alignment, where the bases are present in the read but not used in the alignment. Here, it is saying that all 151 bases of the read are soft-clipped, meaning none of the bases are actually aligned to the reference genome.
This happens because I use the primers to map the reads and then I use Softclipp (Samtools) to mark the primer. So, in reads with a soft-clipp in the middle plus the soft-clipp I stamp completely covers the read.
I am developing an NGS pipeline with BWA MEM, Marrk duplicates and Pisces. I am wondering (and this is the reason of this post) who could I manage these reads in a secure way?? The code I am using for Pisces so far is the basic
apptainer exec --bind "$ref_folder":"$ref_folder" \
"$pisces" dotnet /app/Pisces_5.2.9.122/Pisces.dll \
-bam "${Sample_ID}"_FINAL_SORTED.bam \
-g "$piscesrefgenomefolder" \
-o ./ \
-i ../amplicons_coo_v1.0.bed \
--gvcf false
The quality of this reads (at least in this example) is very high so excluding these reads by quality is not going to work. I need some advice. I think, the best way of doing this is to deleting reads in advance in which the whole read is soft-clipped. Any advice??