Entering edit mode
7.5 years ago
joreamayarom
▴
140
I'm mapping reads in a BAM file to a genome using STAMPY. Apparently, some of my reads are severely malformed and STAMPY is issuing the following complain:
stampy: Mapping failed on input line 1742232 of file /path/to/reads/my_file.R2.fastq.gz: CCCFFFFFHHHGHHJJJJJGHIJJIIJGJI@HIIJJJAGHJJIHHHFFFDDBDDDDDDDCDDDDDDB?<@DBDDA>C:ACBAC?CCD>BDDCCHIGDDHEH:EE7@DED<C;;AD
stampy: Error: (FastQReader:) Sequence and quality lines have different lengths (98 and 115: AGGCAAACGAGCGTTCGGGTCACCTGATGGTGATCACCGCCGCTTACGACCCCGTGCAGCACCAGAGGAGCTACAGGTGTGTTGCCGGCCTTTGAGGT and CCCFFFFFHHHGHHJJJJJGHIJJIIJGJI@HIIJJJAGHJJIHHHFFFDDBDDDDDDDCDDDDDDB?<@DBDDA>C:ACBAC?CCD>BDDCCHIGDDHEH:EE7@DED<C;;AD)
stampy: Traceback:
File "/Net/fs1/home/gerton/Progs/Mapper/stampy/Stampy/reader.py", line 273, in generator
I have tracked down some of the offending lines and they looks like this.
@ILLUMINA:276:C0D97ACXX:5:1101:2429:90560 2:N:0:ACAGTG
AGGCAAACGAGCGTTCGGGTCACCTGATGGTGATCACCGCCGCTTACGACCCCGTGCAGCACCAGAGGAGCTACAGGTGTGTTGCCGGCCTTTGAGGT
+
CCCFFFFFHHHGHHJJJJJGHIJJIIJGJI@HIIJJJAGHJJIHHHFFFDDBDDDDDDDCDDDDDDB?<@DBDDA>C:ACBAC?CCD>BDDCCHIGDDHEH:EE7@DED<C;;AD
The error message makes clear sense now. The HEH:EE7@DED<C;;AD
sequence is hanging all over the place. My question is what could have generated this error? Could it be possible that my files go corrupted while they were being downloaded? Should I simply generate a script that clips this extra sequence.