Samtools segmentation fault when extracting exons from bam file.
1
0
Entering edit mode
5.7 years ago

I have a bam file of RNAseq alignments generated by STAR (alignment.bam) and a list of possible exon positions in a txt file (exons.txt). I'm trying to extract these exons from the alignment file and to do so I'm using a samtools view loop:

#!/usr/bin/env
for line in `cat exons.txt`; do
            samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam
done;

The process has an output status of 0 (zero) but when looking at the stderr file it flags multiple segmentation faults, e.g.:

/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13: 23232 Segmentation fault      samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam

/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13: 30357 Segmentation fault      samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam

/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13:  1273 Segmentation fault      samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam

Exons.txt contains one line per exon in the format of contig1:1-100. The samtools command works fine when run manually on any one of the exon coordinates in exons.txt. When the loop runs, output is normal for the first few lines and then the process fails with segmentation faults. This failure reliably happens as soon as the loop hits a coordinate indicating a new contig in the list. I've tested this by editing exons.txt and the behaviour is the same:

exons.1.txt:

contig1:1-100
contig1:200-300
contig2:1-100        <- segfault
contig2:200-300    <- segfault
contig3:1-100        <- segfault
contig3:200-300    <- segfault

exons.2.txt:

contig2:1-100        
contig2:200-300   
contig3:1-100        <- segfault
contig3:200-300    <- segfault

I've already indexed the bam file using samtools/1.9 in Linux. I cannot go through it one by one as there are over 2300 exons I'm scanning for.

Any help would be greatly appreciated.

rna-seq alignment seg fault samtools • 2.0k views
ADD COMMENT
2
Entering edit mode

You have indexed the bam file but was it sorted before indexing?

Note: I am not sure if you can append to an existing BAM file like what you are trying to do. That may be your problem.

You should convert your intervals to bed format file and then feed that to samtools with -L option.

-L FILE  only include reads overlapping this BED FILE
ADD REPLY
0
Entering edit mode

Excellent, thank you! Converting to a .bed file and using the -L command worked fabulously. Final script looks like:

samtools view -@ 8 -b -h -L exons.bed alignment.bam > exon_alignments.bam
ADD REPLY
5
Entering edit mode
5.7 years ago

There is a -L flag that you can use with samtools to output alignments overlapping intervals defined by a .bed file. That's probably your best option. You'll have to convert your exons.txt to a bed file.

ADD COMMENT
1
Entering edit mode

Agreed. Using the loop in the toplevel post you basically append several BAM files to each other including the header sections etc. That is probably a bad idea.

ADD REPLY
0
Entering edit mode

For future reference, if you want to append to a BAM file, you should use samtools merge.

ADD REPLY
1
Entering edit mode

Strictly speaking, appending would be samtools cat while merge would be combining while maintaining the coordinate-sorted order of the final file ;-)

ADD REPLY
0
Entering edit mode

Thank you! It worked wonderfully.

ADD REPLY

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6