Question

hisat2 compatibility for long read

1

Entering edit mode

3.4 years ago

kishorssf91 ▴ 20

Hi,

I am trying to align PacBio transcriptome reads against the genome to count the gene number. For pair end read i used the following workflow:

# convert gff to gtf
/home/software/cufflinks-2.2.1/gffread xxx.gff -T -o xxx.gtf

# build index
/home/software/hisat2-2.2.1/hisat2_extract_exons.py xxx.gtf > xxx.exon

/home/software/hisat2-2.2.1/hisat2_extract_splice_sites.py xxx.gtf > xxx.ss

/home/software/hisat2-2.2.1/hisat2-build -p 20 xxx.fa --ss xxx.ss --exon xxx.exon xxx_tran

# align reads

/home/software/hisat2-2.2.1/hisat2 -p 80 -x ./xxx_tran -1 R1.fq -2 R2.fq -S gonad.sam

# convert to bam and sort by read name
samtools view -bS gonad.sam -o gonad.bam 

samtools sort -@ 80 gonad.bam -o gonad.sorted.bam

# count gene
htseq-count -f bam --strand=no gonad.sorted.bam xxx.all.gtf > gonad-counts.txt

But in case of Long read i am not sure what i need to during the alignment stage. I have tried with minimap like as following:

# align reads
/home/software/minimap2-2.17_x64-linux/minimap2 -ax splice:hq -uf yyy.fa gill.fq > gill.sam

in following, I got stuck in counting stage:

# count gene
htseq-count -f bam --strand=no gill.sorted.bam yyy.gtf > gill-counts.txt

My question is that during counting it requires sorted bam and genome gtf file that was generated succesfully during previous staps, why it is getting failed even though the same pipeline is absolutely fine for pair end reads? Hisat2 can align long read? Is there any other way do that?

As a novice i will be always grateful to you. Thanks.

Long-read hisat2 • 1.0k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 3.4 years ago by kishorssf91 ▴ 20