Usage of bedTools
1
0
Entering edit mode
3.0 years ago
rubic ▴ 270

Hi,

bedtools usage question.

I know that when running intersectBed between a bam file and user-defined a bed file, the coordinates in the bam are 1-based, and the coordinates in the user-defined bed file, well I'm assuming they should also be 1-based.

The starts in a bed output of bedtools will be 0-based though.

What if I first converted the bam file to a bed file using bedtools's bamToBed and then ran intersectBed with my 1-based bed? Do I need to first add 1 to the starts in the bed output of bam2Bed?

I'll give an example: I have an RNA-seq bam that I want to extract sub-sequences of the mapped read, which intersect a certain interval.

I cannot find a single bedtools command for what I need so I'm doing it in two steps:

  1. Converting the bam file to a bed file using bam2Bed with the -split argument, so that each aligned portion is in a different row in the output
  2. Intersecting the bed file from step 1 with the bed interval.

So, here's a read from the bam file:

molecule/100    0   chr6    41058396    60  1S1096M100N295M865S *   0   0   GTTCCTTCCTTCCTCTCTTACTCTACCTTGGTTTCTCCACCAAACTTTCTCTCTCCCCTTTCCCACCCTCTTTCTCTTACTTTCCCCCTCCCTGTCTTCCTTCCTTTCTCTCTTTTCTTCTCCTTCTTTCTCCTTTTGTATCTCTTGCTTTCTCCTTTGAAACACTTCATATGTGAGCTTTGTGAAGTAAGGAAACACGATGTGCTAAGGGCACCAATGAATGCTCACAGAATTTTTAAATTATTGACTTTAGCTCACTGTTTTACAGATGAGAAAATGGAGTGCTTAAAAATGAATTGCTGATTTCCACATAACATGACCTGTATGATCCACAGGTAATCTATATATTTTCTACATTTGTTTTGGCTCTTGAACTCTGGCTGACTTAGTCCTCACTTTATGGCCCAGGTTGGCCTTGAACTCAGCAATCTTCCTGTCTTAGCCTTCTGAGTGCTGGGATTATAGGAGTTAAGACCGCATACCTGTATTCAATTCTTTTTGCTCCTTTCAGAATATTAATATCCTAAGTGTGTGATTATTTTCTTCTTGCTTTGCACTGTGTTGGCATTTAATGTCTCAAACAAGTAAGCCACATTTGTCCTTCCGGGAAGGCTTGTGTTCTTATTTCTTCTACTGATGTAGTCTTATGTTCCTTCTGTGTCCAGTAAAACATACATGTACCACCTGGGTTTACTGTTTGGCGGTATTACCAAATTTTTTCTGTTTGTCTCTCTCTGTCTGTGTCTCTTGTGTCACTCTGTCTCTCTTTCTCTTCATGTCTTTGTCTTACTCAATCTGTTTCTGTCTTTGTCTCTCTCTTTTTCTTGCCCTTTCTCTCTCCTCTCCTCTCCCTTCCCCCCCCCACGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGTGTGTGTGTGTGTATTTTTGAATCAAGCACCTTTCTGGCTGTGGTGATAGACAGACAAATCACATGTCATCTACAGTTGAATCTACTACCTGGTCTGATAGCTGCAGTAGGTAGGGCTTATTTGCCCTGCCTTGACCCAACTATGGGCTGTAGGCTCCTAAGCTGTGTGGCCTTCTGCCTCTTGGGAATAGGCCCTTTGGAGACGGCTGTTTTCCAGACTCCAAACTATCATGTCACACAGGTGGGAAATGAAGTGTCTTTCAATTGTAAGCAAACTCTGGGCCACGATACTATGTATTGGTACAAGCAAGACTCTAAGAAATTGCTGAAGATTATGTTTAGCTACAATAATAAGCAACTCATTGTAAACGAAACAGTTCCAAGGCGCTTCTCACCTCAGTCTTCAGATAAAGCTCATTTGAATCTTCGAATCAAGTCTGTAGAGCCGGAGGACTCTGCTGTGTATCTCTGTGCCAGCAGCTAAGAGTATAAATTCTGGAAATACGCTCTATTTTGGAGAAGGAAGCCGGCTCATTGTTGTAGGATTGCGGCTTTCCTATGCAAGCCACCACAGCTCTCTTACATCTCAGTGTAGGAGTGAATGTGGAACATCAGAGGATCTGAGAAATGTGACTCCACCCAAGGTCTCCTTGTTTGAGCCATCAAAAGCAGAGATTGCAAACAAACAAAAGGCTACCCTCGTGTGCTTGGCCAGGGGCTTCTTCCCTGACCACGTGGAGCTGAGCTGGTGGGTGAATGGCAAGGAGGTCCACAGTGGGGTCAGCACGGACCCTCAGGCCTACAAGGAGAGCAATTATAGCTACTGCCTGAGCAGCCGCCTGAGGGTCTCTGCTACCTTCTGGCACAATCCTCGCAACCACTTCCGCTGCCAAGTGCAGTTCCATGGGCTTTCAGAGGAGGACAAGTGGCCAGAGGGCTCACCCAAACCTGTCACACAGAACATCAGTGCAGAGGCCTGGGGCCGAGCAGACTGTGGGATTACCTCAGCATCCTATCAACAAGGGGTCTTGTCTGCCACCATCCTCTATGAGATCCTGCTAGGGAAAGCCACCCTGTATGCTGTGCTTGTCAGTACACTGGTGGTGATGGCTATGGTCAAAAGAAAGAATTCATGAAGCCAGATGTGAAGATGAATACAAGAGCTGACAACACATTGTGTTAATACAGATTTTCTTCTCCCAGAACTTCTGAAGAGCTATTCTCATTTGTCTGTGCATCCCAAATTCTGCCTACTAGTCACGCATAGGTGCATTTGTATGTCTGAAATTCTTGTGACCTGGAAAATGCCTACACTTACAATCAAACCAATAAACATGTTCTAGGACGGCCG   *   NM:i:0  ms:i:1391   AS:i:1359   nn:i:0  ts:A:+  tp:A:P  cm:i:439    s1:i:1343   s2:i:0  de:f:0  SA:Z:chr6,41534323,+,1399S857M4700D1S,60,7; rl:i:204

The corresponding bed rows are:

chr6    41534322    41534372    molecule/100    60  +
chr6    41537511    41537583    molecule/100    60  +
chr6    41538217    41538605    molecule/100    60  +
chr6    41539225    41539335    molecule/100    60  +
chr6    41539644    41539879    molecule/100    60  +

So since bamTobed is outputting the bed in 0-based coordinates, the actual starts are the above + 1:

chr6    41534323    41534372    molecule/100    60  +
chr6    41537512    41537583    molecule/100    60  +
chr6    41538218    41538605    molecule/100    60  +
chr6    41539226    41539335    molecule/100    60  +
chr6    41539645    41539879    molecule/100    60  +

When I intersect these bed rows (the 0-based output) with my bed interval:

chr6    41538218    41539881    g1

I'm getting:

chr6    41538218    41538605    molecule/100    60  +   chr6    41538218    41539881    g1  100 +
chr6    41539225    41539335    molecule/100    60  +   chr6    41538218    41539881    g1  100 +
chr6    41539644    41539879    molecule/100    60  +   chr6    41538218    41539881    g1  100 +

So in the intersection output are the starts (second column) 0-based or 1-based?

bedTools bamToBed • 668 views
ADD COMMENT
2
Entering edit mode
3.0 years ago
ATpoint 85k

No, bedtools is smart and will do all necessary things under the hood for you to respect the different coordinate systems. Be sure that you BED file is 0-based and you're good to go. This goes for all bedtools subcommands.

ADD COMMENT

Login before adding your answer.

Traffic: 1104 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6