Duplicated read names in Fastq file

2

Entering edit mode

9.4 years ago

abascalfederico ★ 1.2k

Hi,

I have a BAM file containing paired-end reads. This BAM file is just for storing the results of the sequencing, i.e. it does not contain mapping information.

When I convert this BAM file to a FASTQ file I can see that in some cases there are reads with duplicated names (for about 1% of the reads). For example:

@HS34_15849:1:1101:1065:15188#26/1
@HS34_15849:1:1101:1065:15188#26/2
@HS34_15849:1:1101:1065:15188#26/2

If I use bamToFastq (from bedtools) to generate two FASTQ files (one with each mate from a pair), all these "duplications" are removed. Apparently, bamToFastq retains one of each duplicated read name in a random fashion. Then, it raises a warning about next read having no mate.

Is it normal to have this kind of read name duplications? What is be the best way to handle these duplications?

Many thanks,
Federico

next-gen-sequencing • 3.9k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by abascalfederico ★ 1.2k

0

Entering edit mode

How did you make the BAM file? I suspect that the origin of this issue is there.

ADD REPLY • link 9.4 years ago by Devon Ryan 104k

0

Entering edit mode

Did you check to see if the reads with duplicated names are identical?

ADD REPLY • link 9.4 years ago by h.mon 35k

0

Entering edit mode

H.mon: No, they are different. In fact, in these cases one of the redundant reads is usually much shorter than the typical read length.

Devon: I have asked how these BAM files were generated from the sequencing. Waiting for an answer.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.4 years ago by abascalfederico ★ 1.2k

0

Entering edit mode

Could this perhaps be a case where a read was mapped twice (i.e. BWAmem), I'm not sure if converting a bam with multi-mapped reads to FASTQ format would cause this but it might.

ADD REPLY • link 9.3 years ago by zlskidmore ▴ 290

Login before adding your answer.