Hi,
I am trying to align PacBio transcriptome reads against the genome to count the gene number. For pair end read i used the following workflow:
# convert gff to gtf
/home/software/cufflinks-2.2.1/gffread xxx.gff -T -o xxx.gtf
# build index
/home/software/hisat2-2.2.1/hisat2_extract_exons.py xxx.gtf > xxx.exon
/home/software/hisat2-2.2.1/hisat2_extract_splice_sites.py xxx.gtf > xxx.ss
/home/software/hisat2-2.2.1/hisat2-build -p 20 xxx.fa --ss xxx.ss --exon xxx.exon xxx_tran
# align reads
/home/software/hisat2-2.2.1/hisat2 -p 80 -x ./xxx_tran -1 R1.fq -2 R2.fq -S gonad.sam
# convert to bam and sort by read name
samtools view -bS gonad.sam -o gonad.bam
samtools sort -@ 80 gonad.bam -o gonad.sorted.bam
# count gene
htseq-count -f bam --strand=no gonad.sorted.bam xxx.all.gtf > gonad-counts.txt
But in case of Long read i am not sure what i need to during the alignment stage. I have tried with minimap like as following:
# align reads
/home/software/minimap2-2.17_x64-linux/minimap2 -ax splice:hq -uf yyy.fa gill.fq > gill.sam
in following, I got stuck in counting stage:
# count gene
htseq-count -f bam --strand=no gill.sorted.bam yyy.gtf > gill-counts.txt
My question is that during counting it requires sorted bam and genome gtf file that was generated succesfully during previous staps, why it is getting failed even though the same pipeline is absolutely fine for pair end reads? Hisat2 can align long read? Is there any other way do that?
As a novice i will be always grateful to you. Thanks.