Multiple paired-end gzipped fastq files as input into hisat2 error - fastq format not recognised
0
0
Entering edit mode
8.2 years ago
nwon ▴ 60

Hi fellow gurus,

I am trying to feed multiple files into hisat2 for alignment that are gzipped fastq files, these are paired end. One method of input is to generate a comma-separated list of files within a directory as input.

I have generated this text file however, hisat is not recognising them as being fastq files, they are still gzipped, it is not clear if hisat2 is aware and would pipe through zcat or not. Decompressing first and then running hisat2 works.

Is there a option to make hisat2 aware of gzipped files in "paired-end, multi-file" mode?

${HISAT2}/hisat2 -x $REFO -S hisat2/${k}_o.sam --dta-cufflinks -p $MAXCPU -1 R1.txt -2 R2.txt

Please excuse the bad coding, this is a snippet.

Cheers

Nick

RNA-Seq hisat2 fastq gzip list input • 8.1k views
ADD COMMENT
0
Entering edit mode

You need to specify files (for one sample) on the command line in identical order for -1 and -2 option. From HISAT2 manual: -1 flyA_1.fq,flyB_1.fq -2 flyA_2.fq,flyB_2.fq Most aligners now a days will accept compressed files so there is no need to uncompress them first.

Note: You can't align multiple samples in one command line/together.

ADD REPLY
0
Entering edit mode

Thanks @genomx2,

Can the fastq files be gzipped? Or do they have to be fastq?

Cheers

Nick

ADD REPLY
0
Entering edit mode

gzipped files should be fine.

ADD REPLY
0
Entering edit mode

Hmmmm, I am still getting errors,

Error: reads file does not look like a FASTQ file

terminate called after throwing an instance of 'int' (ERR): hisat2-align died with signal 6 (ABRT)

I wondering if the fact that they are symbolic links an issue?

I zcat | less and they look like fastq files to me! #stumped

ADD REPLY
0
Entering edit mode

test them with gunzip -t filename.gz

ADD REPLY
0
Entering edit mode

Unzipping them is fine and I am sure hisat2 is gz aware.

ADD REPLY
0
Entering edit mode

In the off chance that you're actually naming things R1.txt and R2.txt then rename them to R1.fq.gz and R2.fq.gz.

ADD REPLY
0
Entering edit mode

I was writing R1.txt and R2.txt files from an ls -m <sample>_R1.fastq.gz and <sample>_R2.fastq.gz respectively.

So they were technically a file of comma separated lists. I have resorted to decompressing and processing accordingly (this is working).

I have folders by Lane, and hoping to combine them all as the library was sequenced over many lanes, the other way is to cat them into one file was hoping this would work, it hasn't so far.

ADD REPLY
0
Entering edit mode

A file of comma separated lists? That won't ever work. Just use a comma separated list.

ADD REPLY
0
Entering edit mode

hmmm. Thanks @Devon Ryan, will need to come back to this at some stage, setting ls -m output to a variable and using that is still trowing the same error.

Resorted to alignment of samples by lane and will merge the BAM files in the end. Another way to skin the cat I suppose.

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6