Before doing the alignment, I used fastqc on the fastq.gz files and observed adapter content.
So, I removed the adapter content like below:
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o tr_sample_R1.fastq.gz -p tr_sample_R2.fastq.gz sample_R1.fastq.gz sample_R2.fastq.gz
And then used the fastqs for the alignment with Hisat2
. I observed some warnings in Hisat2 output file. The alignment was done and I can also see the mapping percentage 92%, but what are these warnings in the file?
Warning: skipping mate #1 of read 'ST-E00211:161:HMHCYCCXX:1:1101:30005:12226 1:N:0:TTAGGC' because length (1) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'ST-E00211:161:HMHCYCCXX:1:1101:30005:12226 2:N:0:TTAGGC' because length (1) <= # seed mismatches (0)
Warning: skipping mate #1 of read 'ST-E00211:161:HMHCYCCXX:1:1101:30005:12226 1:N:0:TTAGGC' because it was < 2 characters long
Warning: skipping mate #2 of read 'ST-E00211:161:HMHCYCCXX:1:1101:30005:12226 2:N:0:TTAGGC' because it was < 2 characters long
Warning: skipping mate #1 of read 'ST-E00211:161:HMHCYCCXX:1:1101:22343:14951 1:N:0:TTAGGC' because length (0) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'ST-E00211:161:HMHCYCCXX:1:1101:22343:14951 2:N:0:TTAGGC' because length (0) <= # seed mismatches (0)
Warning: skipping mate #1 of read 'ST-E00211:161:HMHCYCCXX:1:1101:22343:14951 1:N:0:TTAGGC' because it was < 2 characters long
Warning: skipping mate #2 of read 'ST-E00211:161:HMHCYCCXX:1:1101:22343:14951 2:N:0:TTAGGC' because it was < 2 characters long
Warning: skipping mate #1 of read 'ST-E00211:161:HMHCYCCXX:1:1101:28250:19698 1:N:0:TTAGGC' because length (0) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'ST-E00211:161:HMHCYCCXX:1:1101:28250:19698 2:N:0:TTAGGC' because length (0) <= # seed mismatches (0)
Warning: skipping mate #1 of read 'ST-E00211:161:HMHCYCCXX:1:1101:28250:19698 1:N:0:TTAGGC' because it was < 2 characters long
Warning: skipping mate #2 of read 'ST-E00211:161:HMHCYCCXX:1:1101:28250:19698 2:N:0:TTAGGC' because it was < 2 characters long
Warning: skipping mate #1 of read 'ST-E00211:161:HMHCYCCXX:1:1101:32197:22915 1:N:0:TTAGGC' because length (0) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'ST-E00211:161:HMHCYCCXX:1:1101:32197:22915 2:N:0:TTAGGC' because length (0) <= # seed mismatches (0)
Is there anything to worry about these warnings?
Do you have reads that have no sequence in them? Check
grep -A 3 ST-E00211:161:HMHCYCCXX:1:1101:28250:19698
in both R1/R2 files and see if that is the case.When I use that grep on R1/R2 files I don't have anything.