Hi,
I am running LOHHLA pipeline on some cancer samples. I have successfully set up the pipeline at my end with all the dependencies installed. I was also able to run the pipeline using the example dataset.
Now for my data, I have prepared all the required files. But after running the pipeline, I get an error that relates to count.events function in the code. Here is the error line from the terminal.
Error in editDistance - indelTotals :
non-numeric argument to binary operator
Calls: count.events
Execution halted
Here is the snippet from the code which belongs to this count.events function -
count.events <- function(BAMfile, n){
x <- scanBam(BAMfile, index = BAMfile, param=ScanBamParam(what = scanBamWhat(), tag = 'NM'))
readIDs <- x[[1]][['qname']]
cigar <- x[[1]][['cigar']]
editDistance <- unlist(x[[1]][['tag']])
insertionCount <- sapply(cigar, FUN = function(boop) {return(length(grep(pattern = 'I', x = unlist(strsplit(boop, split = '')))))} )
deletionCount <- sapply(cigar, FUN = function(boop) {return(length(grep(pattern = 'D', x = unlist(strsplit(boop, split = '')))))} )
indelTotals <- sapply(cigar, FUN = function(boop) {
tmp <- unlist(strsplit( gsub("([0-9]+)","~\\1~",boop), "~" ))
Is <- grep(pattern = 'I', x = tmp)
Ds <- grep(pattern = 'D', x = tmp)
total <- sum(as.numeric(tmp[(Is-1)])) + sum(as.numeric(tmp[Ds-1]))
return(total)
})
misMatchCount <- editDistance - indelTotals
eventCount <- misMatchCount + insertionCount + deletionCount
names(eventCount) <- 1:length(eventCount)
passed <- eventCount[which(eventCount <= n)]
y <- readIDs[as.numeric(names(passed))]
y <- names(table(y)[which(table(y) == 2)])
return(y)
}
I tried to debug this error and checked the BAM file for the presence of NM tags which are there in the file. I noticed that the editDistance variable is NULL and indelTotals variable has List class.
Can anyone please help me with this issue?
Akshay Zawar were you able to resolve the issue ? Can you check once if your BAM file is not truncated ?
*
samtools quickcheck -v -v -v -v <your.bam>
* Requires latest samtools version
Hi Vijay, How are you? Yes, I found the solution. It was the genome build causing this issue.
Hi Akshay, I encountered the same problems. But looks like it was due to low coverage of my bam files on chr6. Could you talk a little bit more of how you solved the issue?
Hi MaxXinE, the LOHHLA script by default uses hg19 coordinates for HLA region. My data was generated on hg38. I updated the HLA coordinates in the script and that resolved the issue. What is the genome build of your data?
Hey Akshay, thanks so much for the suggestion about hg38 and that solved my problem! Didn't realize that lohhla uses hg19 coordinates. I am using TCGA bam files, which contain some unpaired-end reads. Though lohhla uses VALIDATION_STRINGENCY=SILENT, the unpaired-ended reads end up being in the fastq file. So novoalign -d patient.hlaFasta.nix complains about the unpaired reads and gives errors. Have you ever encountered this issue?
Hi Akshay, I'd like to know how you updated the HLA coordinates in the script. My data is also generated on hg38 and the LOHHLA doesn't work. If possible, could you show me your script?
Hi, you will need to update the regions under "# chr 6 and contigs" section. You can get the hg38 regions for HLA from UCSC genome browser.
Below lines are updated to hg38. You can replace this in the R script to call HLAs on hg38.
Thanks so much for the script and it helps me a lot! I'll try this!