Extract entire reads, rather than aligned segments, from a SAM/BAM file
1
0
Entering edit mode
6.8 years ago
gbdias ▴ 160

Hello,

I am trying to extract long reads from the Blasr .sam output. However, I want to collect the entire raw reads that produced alignments, and not just the aligned segments.

Is there an easy way to do this?

Here is the blasr command I am using:

blasr <reads.fasta> <reference.fasta> -noSplitSubreads -clipping none -sam -out output.sam
sam bam blasr • 2.0k views
ADD COMMENT
1
Entering edit mode

Grab the read ID's from your SAM file and then pull them out of the original fastq by using filterbyname.sh from BBMap.

ADD REPLY
3
Entering edit mode
6.8 years ago
cschu181 ★ 2.8k

I'd assume that the .bam does not have that information. You'd have to get the read ids and get them from the fastq. Something like the following could work. That's what I used to do to extract paired-end reads (in that case the bam needs to be sorted by name).

grep -A3 -F -f <(samtools view <bamfile> | cut -f 1) <source-fastq> | grep -v "\-\-" > <extracted-reads-fastq>
ADD COMMENT
0
Entering edit mode

Thank you. This strategy worked.

ADD REPLY
0
Entering edit mode

An alternative I found is to use seqtk. It takes a list of sequence names in a file and extract the sequences from FASTA/Q.

seqtk subseq in.fq name.lst > out.fq
ADD REPLY

Login before adding your answer.

Traffic: 2361 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6