How to get unaligned reads and aligned reads into separate files from SAM/BAM?
1
0
Entering edit mode
12 months ago
O.rka ▴ 740

I have long reads aligned with MiniMap2 in the form of SAM file. I want to get my unmapped reads into a file called unmapped.fastq.gz and my aligned reads into a file called mapped.fastq.gz.

How can I do this?

I thought I could use BBTools reformat.sh but I realized that I have to run this twice. Is there a tool that can do it all in one go instead of reading through the SAM file twice?

bam sam reads fastq • 761 views
ADD COMMENT
2
Entering edit mode
12 months ago

I suppose if you really wanted to do it in one pass pysam would be an option. Courtesy of ChatGPT with a few modifications.

import pysam
import gzip

bam_file = 'path_to_your.bam'  # Replace with your BAM file path

# Function to convert a BAM read to a FASTQ entry
def bam_read_to_fastq(read):
    name = read.query_name
    seq = read.query_sequence
    qual = read.qual
    return f"@{name}\n{seq}\n+\n{qual}\n"

# Open the BAM file
bam = pysam.AlignmentFile(bam_file, "rb")

# Open output FASTQ files for writing
mapped_fastq = gzip.open('mapped.fastq.gz', 'wt')
unmapped_fastq = gzip.open('unmapped.fastq.gz', 'wt')

# Iterate over reads and write to appropriate FASTQ file
for read in bam:
    if read.is_unmapped:
        unmapped_fastq.write(bam_read_to_fastq(read))
    else:
        mapped_fastq.write(bam_read_to_fastq(read))

# Close the files
bam.close()
mapped_fastq.close()
unmapped_fastq.close()

This will be slower than using samtools to split the bam into into a mapped and unmapped bam, and then converting those two bams to fastq files.

samtools view -@6 -O BAM -F4 -o mapped.bam -U unmapped.bam input.bam

samtools fastq -@6 mapped.bam | gzip > mapped.fastq.gz
samtools fastq -@6 unmapped.bam | gzip > unmapped.fastq.gz
ADD COMMENT

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6