Subsetting BAM by positions
1
0
Entering edit mode
3.5 years ago
בת אל • 0

Hi All,

I have a .cram file and I want to keep only reads that are related to positions I am interested in.

Say the list of positions looks like this (it has about 16k positions):

   rs3094315     1        0.020130          752566 G A
  rs12124819     1        0.020242          776546 A G
  rs28765502     1        0.022137          832918 T C
   rs7419119     1        0.022518          842013 T G
    rs950122     1        0.022720          846864 G C
 rs113171913     1        0.023436          869303 C T
  rs13302957     1        0.024116          891021 G A
  rs59986066     1        0.024183          893462 C T
 rs112905931     1        0.024260          896271 C T
   rs6696609     1        0.024457          903426 C T
  rs13303368     1        0.024771          914852 G C

I tried using samtools view and the region attributes like this:

samtools view -T reference_genome/hs38DH.chr22.fa.gz NA12878.final.cram chr22:59030570-59030570 chr22:28799103-28799103 chr22:28799108-28799108 

and it works great, but when I put the 16k positions it says its too long.

-bash: /usr/bin/samtools: Argument list too long

How do I keep only reads that have a position that appears in my position list?

samtools • 1.2k views
ADD COMMENT
2
Entering edit mode

Assuming that this is not a limitation on samtools side you may be running into limitation on stack space for command line arguments. See if the solution here helps. You could also loop through the positions perhaps?

ADD REPLY
1
Entering edit mode

What GenoMax says here is correct, you're running in the limit of your linux(?) operating system. Looping might be your only option here (or Xargs ?)

You don't have to loop over each of those regions, you can 'reduce' the loop by taking a number of ranges together in the loop

ADD REPLY
0
Entering edit mode

Thank you GenoMax for your super quick reply.

I thought about looping but wanted to avoid it. It seems like a pretty resonable request so I figured there must be a better solution... I'll look into the link you added.

Thanks! Batel.

ADD REPLY
2
Entering edit mode
3.5 years ago

I'm pretty sure that samtools will take a bed file of desired positions, instead of manually listing all the positions. (the -L option) There may not be a limit to how many regions you can input like that.

ADD COMMENT
0
Entering edit mode

Thank you, I'll look into that. So you basically suggest creating a BED file with 1 base sequences ( start and end of the sequence is the position i'm interested in). Did I get you right?

ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6