How To Split Chr11 Bam File In To 2 Halves ?
2
1
Entering edit mode
10.9 years ago
biorepine ★ 1.5k

Dear Biostars

I am aware of samtools splitting of bam file based on chromosome. But is there any way I can divide a single chromosome bam file into halves or quarters ?

Thanx in advance!

bam split • 5.8k views
ADD COMMENT
3
Entering edit mode
10.9 years ago
Pavel Senin ★ 1.9k

print the first line of your file

samtools view test.bam | sed -n -e 1p
D3VDZHS1:98:C1URWACXX:1:1316:17052:8149    73    ctg9    1832    46 ...

print line 100000 of the file

samtools view test.bam | sed -n -e 100000p
D3VDZHS1:98:C1URWACXX:1:2104:15290:4476    185    ctg4    116422    45 ...

print all between

samtools view test.bam | sed -n '/D3VDZHS1:98:C1URWACXX:1:1316:17052:8149/,/D3VDZHS1:98:C1URWACXX:1:2104:15290:4476/p' >part1.sam

voila

wc -l part1.sam
100000 part1.sam

edit: it can be done much easier with split:

samtools view test.bam | split -d -l 100000 - part1_

but as Pierre suggesting (thank you!), you'll need to add headers later to make resulting sam files valid.

ADD COMMENT
2
Entering edit mode

note: 'samtools view -h test.bam ' should be used to produce a valid sam file.

ADD REPLY
2
Entering edit mode
10.9 years ago

Just for fun, I just wrote a java program for this task. See https://github.com/lindenb/jvarkit/wiki/Biostar90204

Example:

$ java -jar dist/biostar90204.jar -m bam.manifest -n 3 -a 5 samtools-0.1.18/examples/toy.sam

$ cat bam.manifest
_splitbam.00001.bam    1    3
_splitbam.00002.bam    4    6
_splitbam.00003.bam    7    9
_splitbam.00004.bam    10    12

$ samtools-0.1.18/samtools view -h _splitbam.00003.bam 
@HD    VN:1.4    SO:unsorted
@SQ    SN:ref    LN:45
@SQ    SN:ref2    LN:40
@PG    ID:0    PN:com.github.lindenb.jvarkit.tools.biostar.Biostar90204    VN:7e17f8bd273cf081d4415bc4f579cd34e2c681d1    CL:-m bam.manifest -n 3 -a 5 samtools-0.1.18/examples/toy.sam
@CO    SPLIT:3
@CO    SPLIT:Starting from Read7
x1    0    ref2    1    30    20M    *    0    0    AGGTTTTATAAAACAAATAA    ????????????????????
x2    0    ref2    2    30    21M    *    0    0    GGTTTTATAAAACAAATAATT    ?????????????????????
x3    0    ref2    6    30    9M4I13M    *    0    0    TTATAAAACAAATAATTAAGTCTACA    ??????????????????????????
ADD COMMENT
1
Entering edit mode

that is a bunch of good code you have up there!

ADD REPLY
0
Entering edit mode

looks good! i think that idents are a bit too wide in your code style

ADD REPLY

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6