Hi, I have been merging 2 paired-end fastq files using
cat file1.fq.gz file2.fq.gz > outputfile.fq.gz
I then mapped the output file and all went fine. However when I try to transform the created sam file in a bam file I get this error:
[W::sam_read1] Parse error at line 23309928
[main_samview] truncated file.
So I watched at the actual line and everything seems ok and does look in the correct format. However, I think I know what makes it bug but I don't know why. Indeed, in the 3rd field (the chr/ref field) I do have "GL000214.1" as you can see below:
ERR1019070.25381935 16 GL000214.1 -1 16 100M * 0 0 CACTATTATTCTCCAAATGATGCGTGCCTCCCTAGAGTCCAGGCTATCTGCATATCTAATTTTTCCCACAAATTACTGTTTTGAATTGCACTGAATTCAA @C@DB?DFHAHHFGGEDGIG@IGGDFGGGIIIIIIIE?GHGH>G?GDFGHGDFGG<FHGEEHGIIGEH@EAGGCEC>777?CFFCE>>CACCDC3>@5>3 XS:i:1
I checked and this "GL000214.1" is declared in the header of my SAMfile tho'...
I did the same process on another dataset, and I do have the same bug caused (probably) by the same issue; the 3rd field contain accession "GLXXXXXXX" that it does not recognise ?
Do you know how can I by pass this or if I did merged it the wrong way ?
Cheers and thank you !
I am doing that because I am using multiple mapping software, and one of them doesn't accept PE reads.. I was thinking that by doing
cat file1_R1.gz file1_R2.gz > outfil1.gz
, then map this file and then by removing duplicates it should be alright no ?You think I have that error because it is not doable to merge R1-R2 using cat ?
I am not sure what aligner you are using. Perhaps it is paying attention to fastq headers and having
R2
headers show up in `R1 files may be an issue for it.Not sure what you mean by that? If your R1/R2 reads are able to merge/overlap you could simply use a program like
bbmerge.sh
to generate a single read representation.The issue is that, with my long reads, I cannot use overlap to merge PE into SE dataset, which is why I chose to use the cat method and not bbmerge. I'll try to make sure the both headers are in my merge dataset and let you know. Thanks a lot !
Indeed you should prefer merging PE into SE. Now about the cat thing, I always use zcat when working on gz compressed files.