Hi there :)
I'm working with D. melanogaster, and trying to align Illumina paired-end reads to reference genome using Hisat2. My ultimate goal is to quantify the detected isoforms. However, I have a problem with the output, as the resulting *.sam file has no alignment inside (has only lines starting with @HD and @SQ with no alignments).
The reference genome I use is: Drosophila_melanogaster.BDGP6.31.dna.genome.fa
(from Ensembl). The annotation I use is: Drosophila_melanogaster.BDGP6.84.gtf
(also from Ensembl). For hisat2 manual, I'm using: https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2 .
The sequencing center provided me multiple files (70 x 2) in .fastq format. I've renamed them as: 001_R1.fastq
, 002_R2.fastq
, 002_R1.fastq
, 002_R2.fastq
... etc. 001_R1.fastq
and 001_R2.fastq
are thus paired. Reads were trimmed with cutadapt.
I first indexed the reference : hisat2-build Drosophila_melanogaster.BDGP6.31.dna.genome.fa
. This worked fine, I have 8x *.ht2 files in my directory.
Then, I extracted the splice sites from the ref: extract_splice_sites.py Drosophila_melanogaster.BDGP6.84.gtf
. This also worked fine, I have a *.splices.txt in my directory.
Then, and here comes the tricky part, I'd like to run hisat2 iteratively on my *.fastq , defining them as part of pair 1 (-1 parameter) or pair 2 (-2 parameter). As hisat2 takes input files as comma-delimited (from manual > Command-Line > Usage), I tried to run job like this:
hisat2 -x bt2_index.idx -1 `ls
*_R1* | tr '\n' ','` -2 `ls *_R2* | tr '\n' ','` | samtools view -bS > Dmel_hisat.bam
Anyways, this does not seem to be correclty interpreted by hisat2. I don't have error message, but my *.sam contains no alignment.
So, How do you proceed when having multiple paired *.fastq as input?
Many thanks for your help.