Need help with Velocyto Analysis. Errors: MemoryError: bam file #0 could not be sorted by cells.
1
0
Entering edit mode
19 months ago
Anna ▴ 20

Good afternoon community! I am new in the field and need you help with an ongoing issue I have with velocity analysis. I am running a velocyto analysis on HPC cluster, and using command run. This is the full script I am using:*

module load Anaconda3
source activate tutorial
export LC_ALL=en_US.utf-8
export LANG=en_US.utf-8
velocyto run -vvv \
--bcfile /home/agera4/RNAvelocityHCI003GM/outs/filtered_feature_bc_matrix/barcodes.tsv.gz \
--outputfolder /home/agera4/RNAvelocityHCI003GM/outs \
--samtools-threads 18 \
--samtools-memory 8000 \
/home/agera4/RNAvelocityHCI003GM/outs/possorted_genome_bam.bam \
/home/agera4/genes.gtf

I have downloaded my gif file from the 10xgenomics portal. Below is the error message I am getting. I attempted to manually sort the bam file using samtools sort -b command, but it would never return anything. Did anyone have the same issue? What could it be related to?

2023-04-04 20:37:47,373 - DEBUG - Marking up chromosome KI270589.1
2023-04-04 20:37:47,373 - WARNING - The .bam file refers to a chromosome 'KI270589.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,373 - WARNING - The .bam file refers to a chromosome 'KI270589.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,383 - DEBUG - Marking up chromosome KI270726.1
2023-04-04 20:37:47,383 - WARNING - The .bam file refers to a chromosome 'KI270726.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,383 - WARNING - The .bam file refers to a chromosome 'KI270726.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,383 - DEBUG - Marking up chromosome KI270735.1
2023-04-04 20:37:47,383 - WARNING - The .bam file refers to a chromosome 'KI270735.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,383 - WARNING - The .bam file refers to a chromosome 'KI270735.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,384 - DEBUG - Marking up chromosome KI270711.1
2023-04-04 20:37:47,384 - WARNING - The .bam file refers to a chromosome 'KI270711.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,384 - WARNING - The .bam file refers to a chromosome 'KI270711.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,391 - DEBUG - Marking up chromosome KI270745.1
2023-04-04 20:37:47,391 - WARNING - The .bam file refers to a chromosome 'KI270745.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,391 - WARNING - The .bam file refers to a chromosome 'KI270745.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,402 - DEBUG - Marking up chromosome KI270714.1
2023-04-04 20:37:47,402 - WARNING - The .bam file refers to a chromosome 'KI270714.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,402 - WARNING - The .bam file refers to a chromosome 'KI270714.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,403 - DEBUG - Marking up chromosome KI270732.1
2023-04-04 20:37:47,403 - WARNING - The .bam file refers to a chromosome 'KI270732.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,403 - WARNING - The .bam file refers to a chromosome 'KI270732.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,403 - DEBUG - Marking up chromosome KI270713.1
2023-04-04 20:37:47,403 - WARNING - The .bam file refers to a chromosome 'KI270713.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,403 - WARNING - The .bam file refers to a chromosome 'KI270713.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,406 - DEBUG - Marking up chromosome KI270754.1
2023-04-04 20:37:47,406 - WARNING - The .bam file refers to a chromosome 'KI270754.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,406 - WARNING - The .bam file refers to a chromosome 'KI270754.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,419 - DEBUG - Marking up chromosome KI270710.1
2023-04-04 20:37:47,419 - WARNING - The .bam file refers to a chromosome 'KI270710.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,419 - WARNING - The .bam file refers to a chromosome 'KI270710.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,421 - DEBUG - Marking up chromosome KI270717.1
2023-04-04 20:37:47,421 - WARNING - The .bam file refers to a chromosome 'KI270717.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,421 - WARNING - The .bam file refers to a chromosome 'KI270717.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,421 - DEBUG - Marking up chromosome KI270720.1
2023-04-04 20:37:47,421 - WARNING - The .bam file refers to a chromosome 'KI270720.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,421 - WARNING - The .bam file refers to a chromosome 'KI270720.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,472 - DEBUG - Marking up chromosome KI270718.1
2023-04-04 20:37:47,472 - WARNING - The .bam file refers to a chromosome 'KI270718.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,472 - WARNING - The .bam file refers to a chromosome 'KI270718.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,486 - DEBUG - Marking up chromosome KI270755.1
2023-04-04 20:37:47,486 - WARNING - The .bam file refers to a chromosome 'KI270755.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,486 - WARNING - The .bam file refers to a chromosome 'KI270755.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,486 - DEBUG - Marking up chromosome KI270707.1
2023-04-04 20:37:47,486 - WARNING - The .bam file refers to a chromosome 'KI270707.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,486 - WARNING - The .bam file refers to a chromosome 'KI270707.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,487 - DEBUG - Marking up chromosome KI270366.1
2023-04-04 20:37:47,487 - WARNING - The .bam file refers to a chromosome 'KI270366.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,487 - WARNING - The .bam file refers to a chromosome 'KI270366.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,487 - DEBUG - Marking up chromosome KI270467.1
2023-04-04 20:37:47,487 - WARNING - The .bam file refers to a chromosome 'KI270467.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,487 - WARNING - The .bam file refers to a chromosome 'KI270467.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,659 - DEBUG - Marking up chromosome KI270528.1
2023-04-04 20:37:47,659 - WARNING - The .bam file refers to a chromosome 'KI270528.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,659 - WARNING - The .bam file refers to a chromosome 'KI270528.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,659 - DEBUG - Marking up chromosome KI270333.1
2023-04-04 20:37:47,659 - WARNING - The .bam file refers to a chromosome 'KI270333.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,659 - WARNING - The .bam file refers to a chromosome 'KI270333.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,660 - DEBUG - Marking up chromosome KI270330.1
2023-04-04 20:37:47,660 - WARNING - The .bam file refers to a chromosome 'KI270330.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,660 - WARNING - The .bam file refers to a chromosome 'KI270330.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,661 - DEBUG - Marking up chromosome KI270466.1
2023-04-04 20:37:47,661 - WARNING - The .bam file refers to a chromosome 'KI270466.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,662 - WARNING - The .bam file refers to a chromosome 'KI270466.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,743 - DEBUG - Marking up chromosome KI270337.1
2023-04-04 20:37:47,743 - WARNING - The .bam file refers to a chromosome 'KI270337.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,743 - WARNING - The .bam file refers to a chromosome 'KI270337.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:47,744 - DEBUG - Marking up chromosome KI270336.1
2023-04-04 20:37:47,744 - WARNING - The .bam file refers to a chromosome 'KI270336.1+' not present in the annotation (.gtf) file
2023-04-04 20:37:47,744 - WARNING - The .bam file refers to a chromosome 'KI270336.1-' not present in the annotation (.gtf) file
2023-04-04 20:37:52,626 - DEBUG - Read first 170 million reads
2023-04-04 20:38:03,141 - DEBUG - End of file. Reset index: start scanning from initial position.
2023-04-04 20:38:03,141 - DEBUG - 2901308 reads were skipped because no apropiate cell or umi barcode was found
2023-04-04 20:38:03,142 - INFO - Now just waiting that the bam sorting process terminates
Traceback (most recent call last):
  File "/home/agera4/.conda/envs/tutorial/bin/velocyto", line 11, in <module>
    sys.exit(cli())
  File "/home/agera4/.conda/envs/tutorial/lib/python3.6/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/agera4/.conda/envs/tutorial/lib/python3.6/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/agera4/.conda/envs/tutorial/lib/python3.6/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/agera4/.conda/envs/tutorial/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/agera4/.conda/envs/tutorial/lib/python3.6/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/agera4/.conda/envs/tutorial/lib/python3.6/site-packages/velocyto/commands/run.py", line 116, in run
    samtools_memory=samtools_memory, dump=dump, loom_numeric_dtype=dtype, verbose=verbose, additional_ca=additional_ca)
  File "/home/agera4/.conda/envs/tutorial/lib/python3.6/site-packages/velocyto/commands/_run.py", line 225, in _run
    Otherwise sort manually by samtools ``sort -l [compression] -m [mb_to_use]M -t [tagname] -O BAM -@ [threads_to_use] -o cellsorted_[bamfile] [bamfile]``")
MemoryError: bam file #0 could not be sorted by cells.
                This is probably related to an old version of samtools, please install samtools >= 1.6.                In alternative this could be a memory error, try to set the --samtools_memory option to a value compatible with your system.                 Otherwise sort manually by samtools ``sort -l [compression] -m [mb_to_use]M -t [tagname] -O BAM -@ [threads_to_use] -o cellsorted_[bamfile] [bamfile]``
10xgenomics velocyto RNA velocity • 2.2k views
ADD COMMENT
0
Entering edit mode

Have you tried the suggestions the error message provided?

MemoryError: bam file #0 could not be sorted by cells. This is probably related to an old version of samtools, please install samtools >= 1.6. In alternative this could be a memory error, try to set the --samtools_memory option to a value compatible with your system

As a related suggestions velocyto is dated and takes a long time to run. If you can it's better to go through STARsolo, alevin-fry, or kallisto bustools to get your spliced/unspliced counts.

ADD REPLY
1
Entering edit mode

So, yesterday I tried to install samtools 1.11 using conda forge command and indeed, it worked! My only concern that it gave me this message prior to generating loom file:

2023-04-06 01:58:23,765 - DEBUG - Counting for batch 53, containing 100 cells and 2113146 reads
2023-04-06 01:59:13,998 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 01:59:47,372 - DEBUG - Counting for batch 54, containing 100 cells and 2067713 reads
2023-04-06 02:00:34,551 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:01:10,871 - DEBUG - Counting for batch 55, containing 100 cells and 2198696 reads
2023-04-06 02:02:02,179 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:02:11,055 - DEBUG - Read first 140 million reads
2023-04-06 02:02:36,163 - DEBUG - Counting for batch 56, containing 100 cells and 2122480 reads
2023-04-06 02:03:24,093 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:03:58,168 - DEBUG - Counting for batch 57, containing 100 cells and 2038967 reads
2023-04-06 02:04:44,790 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:05:14,377 - DEBUG - Counting for batch 58, containing 100 cells and 1815443 reads
2023-04-06 02:05:56,005 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:06:29,146 - DEBUG - Counting for batch 59, containing 100 cells and 1983653 reads
2023-04-06 02:07:15,678 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:07:29,802 - DEBUG - Read first 150 million reads
2023-04-06 02:07:49,450 - DEBUG - Counting for batch 60, containing 100 cells and 1987572 reads
2023-04-06 02:08:34,875 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:09:08,640 - DEBUG - Counting for batch 61, containing 100 cells and 1984640 reads
2023-04-06 02:09:53,723 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:10:23,488 - DEBUG - Counting for batch 62, containing 100 cells and 1759512 reads
2023-04-06 02:11:03,700 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:11:35,719 - DEBUG - Counting for batch 63, containing 100 cells and 1949645 reads
2023-04-06 02:12:22,034 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:12:39,527 - DEBUG - Read first 160 million reads
2023-04-06 02:12:51,977 - DEBUG - Counting for batch 64, containing 100 cells and 1832995 reads
2023-04-06 02:13:34,045 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:14:10,314 - DEBUG - Counting for batch 65, containing 100 cells and 2332438 reads
2023-04-06 02:15:04,930 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:15:37,596 - DEBUG - Counting for batch 66, containing 100 cells and 2105230 reads
2023-04-06 02:16:26,975 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:16:59,855 - DEBUG - Counting for batch 67, containing 100 cells and 2045972 reads
2023-04-06 02:17:46,628 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:18:05,560 - DEBUG - Read first 170 million reads
2023-04-06 02:18:19,976 - DEBUG - Counting for batch 68, containing 100 cells and 1927368 reads
2023-04-06 02:19:04,940 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:19:33,139 - DEBUG - Counting for batch 69, containing 78 cells and 1720895 reads
2023-04-06 02:20:12,860 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-04-06 02:20:18,265 - DEBUG - 2901308 reads were skipped because no apropiate cell or umi barcode was found
2023-04-06 02:20:18,266 - DEBUG - Counting done!
2023-04-06 02:20:18,278 - DEBUG - Generating output file /home/agera4/RNAvelocityHCI003GM/outs/possorted_genome_bam_CE8EQ.loom
2023-04-06 02:20:18,278 - DEBUG - Collecting row attributes
2023-04-06 02:20:18,364 - DEBUG - Generating data table
2023-04-06 02:20:20,515 - DEBUG - Writing loom file
2023-04-06 02:20:34,314 - DEBUG - Terminated Succesfully!
Epilog:  nodes=1:ppn=1
ADD REPLY
0
Entering edit mode

You butchered the code segment that you pasted by formatting it as quoted text instead of using the code option. I've tried to clean it up as best as I can but it could still be wrong. If you can access the original error message, please edit your post and copy-paste it again, this time using the proper 101010 button and NOT THE double quote button.

code_formatting

ADD REPLY
0
Entering edit mode

Thank you for letting me know how to do it!

ADD REPLY
0
Entering edit mode
19 months ago
ATpoint 85k

What is your samtools version? Also, samtools total memory consumption is samtools-memory * samtools-threads which would be 8GB * 18 which is 144GB -- does the machine/node has that much? Try to reduce it, you might simply run out of memory causing this crash. Even on a proper HPC node I use "only" 8 cores and 1000 (1GB) memory, beyond that speed gains are limited and do not merit the extensive resources.

ADD COMMENT
1
Entering edit mode

yes, you were right, I did this and it generated a loom file:

--samtools-threads 8 \
--samtools-memory 4000 \
ADD REPLY

Login before adding your answer.

Traffic: 2899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6