Issue when merging FastQ
1
0
Entering edit mode
3.9 years ago
gayal25016 • 0

Hi, I have been merging 2 paired-end fastq files using

cat file1.fq.gz file2.fq.gz > outputfile.fq.gz

I then mapped the output file and all went fine. However when I try to transform the created sam file in a bam file I get this error:

[W::sam_read1] Parse error at line 23309928

[main_samview] truncated file.

So I watched at the actual line and everything seems ok and does look in the correct format. However, I think I know what makes it bug but I don't know why. Indeed, in the 3rd field (the chr/ref field) I do have "GL000214.1" as you can see below:

ERR1019070.25381935     16      GL000214.1      -1      16      100M    *       0       0       CACTATTATTCTCCAAATGATGCGTGCCTCCCTAGAGTCCAGGCTATCTGCATATCTAATTTTTCCCACAAATTACTGTTTTGAATTGCACTGAATTCAA    @C@DB?DFHAHHFGGEDGIG@IGGDFGGGIIIIIIIE?GHGH>G?GDFGHGDFGG<FHGEEHGIIGEH@EAGGCEC>777?CFFCE>>CACCDC3>@5>3    XS:i:1

I checked and this "GL000214.1" is declared in the header of my SAMfile tho'...

I did the same process on another dataset, and I do have the same bug caused (probably) by the same issue; the 3rd field contain accession "GLXXXXXXX" that it does not recognise ?

Do you know how can I by pass this or if I did merged it the wrong way ?

Cheers and thank you !

Fastq Merging mapping SAM BAM • 1.1k views
ADD COMMENT
0
Entering edit mode
3.9 years ago
GenoMax 147k

have been merging 2 paired-end fastq files using

Just to be sure, you are merging R1 and R2 files independently and in the same order e.g. cat file1_R1.gz file2_R1.gz .. and cat file1_R2.gz file2_R2.gz ..? You can't do cat file1_R1.gz file1_R2.gz.

ADD COMMENT
0
Entering edit mode

I am doing that because I am using multiple mapping software, and one of them doesn't accept PE reads.. I was thinking that by doing cat file1_R1.gz file1_R2.gz > outfil1.gz , then map this file and then by removing duplicates it should be alright no ?

You think I have that error because it is not doable to merge R1-R2 using cat ?

ADD REPLY
0
Entering edit mode

I am not sure what aligner you are using. Perhaps it is paying attention to fastq headers and having R2 headers show up in `R1 files may be an issue for it.

I was thinking that by doing cat file1_R1.gz file1_R2.gz > outfil1.gz , tthen map this file and then by removing duplicates it should be alright no ?

Not sure what you mean by that? If your R1/R2 reads are able to merge/overlap you could simply use a program like bbmerge.sh to generate a single read representation.

ADD REPLY
0
Entering edit mode

The issue is that, with my long reads, I cannot use overlap to merge PE into SE dataset, which is why I chose to use the cat method and not bbmerge. I'll try to make sure the both headers are in my merge dataset and let you know. Thanks a lot !

ADD REPLY
0
Entering edit mode

Indeed you should prefer merging PE into SE. Now about the cat thing, I always use zcat when working on gz compressed files.

ADD REPLY

Login before adding your answer.

Traffic: 1788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6