So, there's a lot going on here. Let me preface my this by saying that I am not a biologist in any way shape or form; I'm a compiler writer who found this site by accident; keep in mind that I'm commenting only on what the documentation says the program is supposed to do and the code you typed, and I am making no assumptions on what you meant to do, since I have no idea what gene alignments are.
First, you specified both -q and -r. It's either or with those two, so there's that. I'm not sure if you actually meant to make your output variable get called by --rg. That option sets a label on the output, and it's not actually input. What's even weirder is that they're all the same file obviously, so I'm not sure what the point of that is.
The option I think you should have used is -q like you did at first because you have fastq files, but they're compressed. I saw the documentation say something about them, but honestly, there's no need to make your life more complicated than it already is, in my opinion.
ls ./*.gz | xargs -n 1 gzip -d
That will decompress every single .gz file in the directory so that you're only dealing with fasta files proper, and then you can simply pass them in. Which speaking of, it looks like you either pass in two like this:
-1 file1.fastq -2 file2.fastq
Or you go all out with the other option:
-U file1.fastq,file2.fastq,file3.fastq,etc.,file100.fastq
Now like I said earlier, the -r is redundant, since it contradicts -q. Moreover, The documentation clearly shows that either you specify a filename with -S where that is the output file, or you don't and the result gets printed to standard out. This second scenario is obviously the one you want, since otherwise there would be nothing to pipe to the next program. Exit code 1 in Unix means EXIT_FAILURE, and from the log in the beginning you can see that since the program is printing to stdout, so clearly it's not interpreting the last argument ($r1) as an output file. It doesn't really matter though, because whatever it's interpreting it as, hisat2 is interpreting it as an error grave enough to terminate immediately, and that's why you're getting that error from samblaster; since hisat2 just quit without printing anything to stdout, samblaster essentially got called with no input.
In case you're curious why there was output in the console but samblaster didn't ever get to see it, it's because errors print to stderr, not stdout, so unless you explicitly piped stderr to where stdout is going, as far as samblaster is concerned there was no input. It's a completely different stream.
Anyways, I hope that was somewhat helpful. Unfortunately I would need to see your input files and all your code to be able to really diagnose the problem, but I'll be around, let me know if there's anything I can help out with. Good luck, dude.
Please show the entire function that you used (so the part where you declared the variables). What is
--${4}
? The option parameter is missing. Is it the splice site file? I also think it should be--rg-id ${output}
without=
. As jflopezfernandez said, use-U
or-1/-2
depending on your data being single- or paired-end. If it is single-end, you'll need--ignoreUnmated
to makesamblaster
accept non-paired data. Is this RNA-seq? If so, please use the search function on RNA-seq deduplication and then decide if it is indeed a good thing to do. Beyond that, you could optimize your pipe by printing the output ofview
as SAM rather than BAM (only optionh
instead of-bSh
asfixmate
can read SAM, saving the compression time from SAM to BAM.sort
will then take care of the BAM conversion. Be sure to have yoursamtools
at the current version (1.9).