Entering edit mode
5.7 years ago
zhao03
▴
70
Hi,
The purpose is to assemble transcripts with unique mapping read.
I used samtools
to extract unique read of RNA-seq with following steps
samtools view -H CF1_mapped_reads.bam > header.sam
samtools view -F 4 CF1_mapped_reads.bam | grep "\bNH:i:1\b" | cat header.sam - | samtools view -b - > unique.bam
then, I used StringTie to assemble transcript,
stringtie unique.bam -f 0.1 -o uiq-stringtie_assembly.gtf -p 4 -G gencode.v29.annotation.gtf
However, I got error messages as following:
Error at GBitVec: index 35 out of bounds (size 35)
Thanks
What is your definition of uniquely mapped reads, which itself could be a fuzzy concept if we adopt a probability framework? Usually, we can work with the reads that only map to one genome location with a certain number of mismatches (such as 1 or 2). I think the correct way to extract those reads are based on MapQ and SA tags in BAMs. samtools could take care of those.
Also, I suspect that the Stringtie error you see here is not related to whether reads are uniquely mapped, or whether you extracted those reads successfully. It sounds more like a read alignment issue at some specific places. See this:
https://github.com/gpertea/stringtie/issues/87