Question

How to work with a BAM file that have an inaccessible reference genome

0

Entering edit mode

2.6 years ago

Ak ▴ 60

So I have come across a BAM file of a parasite genome and was trying to identify the variants.

As I could not find the reference genome that was used for the alignment, I was thinking of converting the BAM file into fastq format and have them categorized accordingly into R1, R2 and singletons. Then, aligning it to the reference genome that I have.

But I am wondering if this is feasible? Could anyone give some advice? Thanks.

Genome fastq alignment Reference bam • 973 views

ADD COMMENT • link 2.5 years ago by Ak ▴ 60

0

Entering edit mode

https://gatk.broadinstitute.org/hc/en-us/articles/360036485372-SamToFastq-Picard-

ADD REPLY • link 2.6 years ago by massa.kassa.sc3na ▴ 630

0

Entering edit mode

@ Ak Why did you delete this post?

ADD REPLY • link 2.6 years ago by Ram 44k

0

Entering edit mode

I've noticed that there're actually alot of similar questions out there (e.g. convert bam to fastq etc.), just the way that I've asked may be more of a roundabout way. So I figured it was rather redundant, might as well deleting it.

ADD REPLY • link 2.6 years ago by Ak ▴ 60

0

Entering edit mode

In that case, add an answer with links to one or more posts that you found useful and maybe add some text about how you found these posts. Then, accept that answer. You have learned how to use the forum better, and that knowledge could be useful to others.

ADD REPLY • link 2.6 years ago by Ram 44k

score 1 · Answer 1 · 2022-05-04

1

Entering edit mode

2.6 years ago

Ak ▴ 60

Thanks all for the help/advice. I eventually used samtools fastq to extract the sequences. But prior to that I used samtools collate.

samtools collate -u -o input.collate.bam input.bam
samtools fastq -1 paired1.fq -2 paired2.fq -0 unmapped.fq -s singletons.fq -n input.collate.bam

ADD COMMENT • link 2.5 years ago by Ak ▴ 60