Entering edit mode
7.5 years ago
veronicaschroeder78
▴
150
I am a newbie in NGS data analysis and trying to run hisat2 after successfully building a index, with the following bash script:
trimmedfiles=... (list of trimmed files from cutadapt)
pairedfiles=... (list of trimmed files from cutadapt)
hisat2 -t \
-p 8 \
-x my_genome_idx_prefix \
-1 $trimmedfiles \
-2 $pairedfiles \
--dta-cufflinks -
-S aligned.sam | tee log.txt
But even if I redirect every possible output with &> and tee, I receive thousands of annoying messages on my terminal with these warnings:
...
Warning: skipping mate #2 of read 'X01292:256:HJKTSJDD:2:1199:1356 2:N:0:GCCAAT' because length (1) <= seed mismatches (0)
Warning: skipping mate #2 of read 'X01292:256:HJKTSJDD:2:1199:1356 2:N:0:GCCAAT' because it was < 2 characters long
...
I have no idea what this could possibly mean. Should I tweak my parameters? Any suggestion?
Looks like you are using trimmed reads, trimmed by a program that did not discard reads too short for alignment. These are not particularly useful, and as you have seen, cause problems with some programs. You should obtain the raw reads and trim them yourself (discarding reads too short to be useful), or ignore the warnings, or use a different aligner. My suggestion would be all three.
What happens if you grep "X01292:256:HJKTSJDD:2:1199:1356 2:N:0:GCCAAT" your_fastq?