I'm using UMI-tools to get count matrix from scRNA-seq data. I used STAR to map reads to the ref, and then put the sorted bam into UMI-tools dedup, and I got such error. Here are my command and error:
umi_tools dedup -I test_B10-1.Aligned.sortedByCoord.out.bam --paired -S test_B10-1.deduplicated.bam
2020-03-04 01:41:21,485 WARNING Chimeric read pairs are being used. Some read pair UMIs may be grouped/deduplicated using just the mapping coordinates from read1.This may also increase the run time and memory usage. Consider --chimeric-pairs==discard to discard these reads or --chimeric-pairs==output (group command only) to output them without grouping 2020-03-04 01:41:21,485 WARNING Unpaired read pairs are being used. Some read pair UMIs may be grouped/deduplicated using just the mapping coordinates from read1.This may also increase the run time and memory usage. Consider --unpared-reads==discard to discard these reads or --unpared-reads==output (group command only) to output them without grouping 2020-03-04 01:41:21,485 INFO command: dedup -I test_B10-1.Aligned.sortedByCoord.out.bam --paired -S test_B10-1.deduplicated.bam Traceback (most recent call last): File "/public/home/syli/software/miniconda3/bin/umi_tools", line 11, in <module> sys.exit(main()) File "/public/home/syli/software/miniconda3/lib/python3.6/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/public/home/syli/software/miniconda3/lib/python3.6/site-packages/umi_tools/dedup.py", line 262, in main for bundle, key, status in bundle_iterator(inreads): File "/public/home/syli/software/miniconda3/lib/python3.6/site-packages/umi_tools/sam_methods.py", line 375, in __call__ read.reference_name != read.next_reference_name): File "pysam/libcalignedsegment.pyx", line 965, in pysam.libcalignedsegment.AlignedSegment.next_reference_name.__get__ (pysam/libcalignedsegment.c:12545) File "pysam/libcalignmentfile.pyx", line 1609, in pysam.libcalignmentfile.AlignmentFile.getrname File "pysam/libcalignmentfile.pyx", line 672, in pysam.libcalignmentfile.AlignmentFile.get_reference_name
ValueError: reference_id -1 out of range 0<=tid<359
I've tried use Bowtie2 to map the same fastq files to the same ref,and then the bam file went into the UMI-tools dedup, it worked, however i perfer to STAR.
bowtie2 map
bowtie2 -q --phred33 --very-fast --end-to-end -p 8 -x genome_ref -1 B10-1.bbmap.1.fastq.gz -2 B10-1.bbmap.2.fastq.gz | samtools view -@ 8 -Sb - > B10-1.b73.fast.bam
STAR map
~/software/STAR-2.6.1b/bin/Linux_x86_64/STAR --runThreadN 10 --genomeDir ~/genome/ --readFilesIn B10-1.bbmap.1.fastq.gz B10-1.bbmap.2.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outBAMsortingThreadN 10 --outFilterMultimapNmax 1 --outFileNamePrefix test_B10-1.
i've checked the reference name in the bam file, all are contained in the @SQ. I appreciate for any suggestion!
This is my first question on Biostars, i was confused about operation on submit the code and error, look my ugly question, can somebody help me?
Did you try
--chimeric-reads==discard
?I assume you checked to make sure that your two fastqs have the same number of reads?
Thanks for the quick answer! STAR chimeric info output into Chineric.out.junction not into the main aligned BAM files by default, set by
--chimOutType
, and i cannot find--chimeric-reads==discard
in STAR-2.6.1b for the number of reads, i check my fastq file 29764216 B10-1.bbmap.1.fastq.gz 64723195 B10-1.bbmap.2.fastq.gz I got these two files though the code below:It's umi_tools telling you to address chimeras, not STAR.
Something is really wrong if your two fastqs have different number of reads.
checked files size after every step, before and after trim, read1.fastq.gz and read2.fastq.gz have the same number of reads,and the almost equal size of files, and after umitools extract, although the file sizes have such a big difference, they contain the same number of reads( i confirmed this by calculating the number of lines in each fastq file), the same is true after BBmap. So before mapping, read1.fastq.gz and read2.fastq.gz have the same number of reads…… And I've tried -
-chimeric-reads==discard
, it still reports the same error.Is there something else i can try?