Sampling random reads from BAM files with allowing same reads
1
0
Entering edit mode
5.1 years ago
SSK ▴ 10

Hi.

I'm trying to randomly sample reads with allowing same reads.

I tried to random sampling by using "samtools view -s", but it seems like samtools samples without duplicate.

How can I do this ?

Thanks!

reads random_sampling samtools bam • 3.6k views
ADD COMMENT
2
Entering edit mode

I don't think that will give you a valid bam then if you reuse read names.

ADD REPLY
0
Entering edit mode

The aim of samtools view -s is to simulate sequencing at a reduced depth.

It sub-samples based on a hash of the read name. Thus all reads in a pair are kept or none of the reads in that pair. This is better than a naive sampling which ends up breaking most read pairs.

I don't understand what you're asking for beyond how it already works.

ADD REPLY
3
Entering edit mode
5.1 years ago

Please use the search function:

Question: Sampling with replacement from a bam file

However, here is a solution in R:

library(GenomicAlignments)
library(rtracklayer)

## your bam file
bam.file <- 'your_bam_file.bam'

## is it paired end or not
paired <- TRUE

## read in the bam file
if(paired){
  bam.input <- readGAlignmentPairs(bam.file)
}else{
  bam.input <- readGAlignments(bam.file)
}

## select your number of subsets
num.samples <- 100

## subsample WITH replacement
bam.subsample <- sample(bam.input,num.samples,replace=TRUE)

## export new bam file
export(bam.subsample,BamFile('bam_subsampled.bam'))
ADD COMMENT
0
Entering edit mode

You lose a lot of metadata this way so if you want anything specific you should look at using the ScanBamParam() parameter during the 'read in the bam file' step.

ADD REPLY
0
Entering edit mode

Thanks benformatics! I'ts OK if I lose a lot of meta data. I try using your script. Thanks!

ADD REPLY
0
Entering edit mode

Fixed a typo in case you are using unpaired reads.

ADD REPLY

Login before adding your answer.

Traffic: 2194 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6