I have a bam file of RNAseq alignments generated by STAR (alignment.bam) and a list of possible exon positions in a txt file (exons.txt). I'm trying to extract these exons from the alignment file and to do so I'm using a samtools view loop:
#!/usr/bin/env
for line in `cat exons.txt`; do
samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam
done;
The process has an output status of 0 (zero) but when looking at the stderr file it flags multiple segmentation faults, e.g.:
/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13: 23232 Segmentation fault samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam
/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13: 30357 Segmentation fault samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam
/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13: 1273 Segmentation fault samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam
Exons.txt contains one line per exon in the format of contig1:1-100. The samtools command works fine when run manually on any one of the exon coordinates in exons.txt. When the loop runs, output is normal for the first few lines and then the process fails with segmentation faults. This failure reliably happens as soon as the loop hits a coordinate indicating a new contig in the list. I've tested this by editing exons.txt and the behaviour is the same:
exons.1.txt:
contig1:1-100
contig1:200-300
contig2:1-100 <- segfault
contig2:200-300 <- segfault
contig3:1-100 <- segfault
contig3:200-300 <- segfault
exons.2.txt:
contig2:1-100
contig2:200-300
contig3:1-100 <- segfault
contig3:200-300 <- segfault
I've already indexed the bam file using samtools/1.9 in Linux. I cannot go through it one by one as there are over 2300 exons I'm scanning for.
Any help would be greatly appreciated.
You have indexed the bam file but was it sorted before indexing?
Note: I am not sure if you can
append
to an existing BAM file like what you are trying to do. That may be your problem.You should convert your intervals to bed format file and then feed that to
samtools
with-L
option.Excellent, thank you! Converting to a .bed file and using the -L command worked fabulously. Final script looks like: