samtools: splitting a bam file putting all scaffolds together
2
1
Entering edit mode
6.0 years ago
Fedster ▴ 30

I have some bam files that I want to split. The reference genome has 22 chromosomes and a bunch of unplaced scaffolds. I can easily split the bam files chromosome by chromososme

samtools view my.bam chromsosmeI -b -o my_chromosomeI.bam

I can do the same scaffold by scaffold, but I would like to do extract all scaffolds with some sort of wild card (there are a lot of them) and have them all in the same bam output. The wild card part is the one I'm having issues.

samtools bam splitting scaffold • 3.9k views
ADD COMMENT
1
Entering edit mode

how about using the option '-L' of samtools view ?

 -L FILE  only include reads overlapping this BED FILE [null]
ADD REPLY
0
Entering edit mode

I'd try to extract the chromosomes and scaffolds by samtools view -Hand create a simple bed-file out of it. Your scaffold, you'd like to group store in separate files which are then used as e.g. samtools view my.bam -L scffod_group1.bed - b -o my_scaffold_group1.bam .

Additionally, I'd add the -h parameter in the samtools view calls to include the header.

ADD REPLY
4
Entering edit mode
6.0 years ago
ATpoint 85k
samtools idxstats in.bam | cut -f1 | grep <grepForScaffoldNames> | xargs samtools view -o scaffolds.bam in.bam
ADD COMMENT
1
Entering edit mode

works, if you add it as an answer I will accept it as the correct one. Cheers

ADD REPLY
1
Entering edit mode
6.0 years ago

Make sure you have your bam file indexed by samtools index. Then you can do something like this:

$ samtools idxstats EQ18-NGS_S4-final.bam|cut -f1|grep -v "*"|parallel 'samtools view -o {}.bam my.bam {}'

samtools idxstats gives a tab-delimited file with informations about mapped read on each contig. The first column contains the name, so we extract it with cut -f1. We need to exclude the row with the name * by grep -v "*" as this is the reserved name for unmapped reads.

Now we have a list of all contigs and pass this to parallel to start samtools view with contig name as region parameter.

fin swimmer

ADD COMMENT
0
Entering edit mode

I do not want to run all chroms and contigs in parallel, I want to get all the fragments mapping to any contig (not in a chromosome) out in one single step and dump in one single file. My problem is not listing all possible fragments/contigs/chroms, it using a wildcard in samtools view to get out stuff on an unmapper contig, no matter the contig.

ADD REPLY
0
Entering edit mode

basically samtools view my.bam contig* -b -o output.bam does not work

ADD REPLY

Login before adding your answer.

Traffic: 2290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6