Question

Is it possible to retrieve the original reads from a BAM file?

0

Entering edit mode

5.1 years ago

elcortegano ▴ 200

For testing purposes, I'm interested into work with a small limited number of reads that align well to a given genomic region, also small, so computation time and memory requirements are as low as possible.

This could be done (I guess) if from the alignment BAM file one could associate the aligned reads with specific reads in the original FASTQ file.

I don't know if this is even possible. So far, I've found nothing.

next-gen bam fastq • 5.1k views

ADD COMMENT • link updated 5.1 years ago by ctseto ▴ 310 • written 5.1 years ago by elcortegano ▴ 200

3

Entering edit mode

Yes, reads can be extracted using e.g. samtools fastq. It is recommended to randomize the BAM before doing that as many alignment tools expect random fastq order (for paired-end data). Could be don with samtools collate or samtools sort -n followed by samtools fastq. For other threads on this please use the search function, there are many.

ADD REPLY • link 5.1 years ago by ATpoint 85k

1

Entering edit mode

Two questions:

What will samtools fastq do with hard-clipped reads/secondary alignments? Output shortened reads? I assume some kind of filtering should be applied before.
Do you have any reference for the random fastq order requirement? I have not heard of this before. That would be good to know for certain applications...

Edit: formatting

ADD REPLY • link 5.1 years ago by cschu181 ★ 2.8k

2

Entering edit mode

Yes. You can filter your BAM file to separate aligned reads and then convert them back to fastq.

You can first filter your BAM with samtools view region to get the region you need.

Then use reformat.sh from BBMap suite to retrieve reads:

reformat.sh in=your.bam out1=R1.fq.gz out2=R2.fq.gz mappedonly=t pairedonly=t primaryonly=t

If you have single end data then just use out=read.fq.gz instead.

ADD REPLY • link 5.1 years ago by GenoMax 147k

1

Entering edit mode

So you would like to convert bam to fastq? Have you googled for that?

ADD REPLY • link 5.1 years ago by WouterDeCoster 47k

0

Entering edit mode

Not with that words, my mistake (I'm not native speaker). Now I 've found a way with bedtools, thank you

ADD REPLY • link 5.1 years ago by elcortegano ▴ 200

0

Entering edit mode

I'm not native speaker

Don't worry. Most here aren't. Neither WouterDeCoster nor me are.

ADD REPLY • link 5.1 years ago by ATpoint 85k

0

Entering edit mode

where you able to retrieve your original reads file frombam files? Can you share the processs and commands? I really need help on it I mixed up my data and need to retrieve from bam files the raw reads.

ADD REPLY • link 4.1 years ago by bazghazia7 • 0

0

Entering edit mode

There are multiple suggestions on how to do this in this thread. Suggest you pick one and try it.

ADD REPLY • link 4.1 years ago by ATpoint 85k

0

Entering edit mode

5.1 years ago

ctseto ▴ 310

You can pipe them out using samtools and the sam flags 64 and 128 for read 1 and read 2, and tune the extractables more carefully with convoluted combinations of the SAM flags.

ADD COMMENT • link 5.1 years ago by ctseto ▴ 310

score 2 · Accepted Answer · 2019-10-03

2

Entering edit mode

5.1 years ago

elcortegano ▴ 200

Ok, I got a way to do it. You can find it in the following link: https://seqome.com/convert-bam-file-fastq/.

There it describes how bedtools bamtofastq can deal with the task. It will only require a sorted BAM file as input, eg:

bedtools bamtofastq -i input.bam -fq output.fq -fq2