How to use DownsampleSam quickly and efficiently with multiple PROBABILITies?
0
0
Entering edit mode
3.6 years ago
-IvanWoo- • 0

I want to build a standardized quality control process for particular RNA-seq samples, so I intend to downsample the alignment results (BAM files) with multiple proportions.

I tried to use DownsampleSam in Picard tools, and my command was as follows.

for SAMPLE in sample1 sample2 sample3 sample4 sample5 sample6; do
  parallel --env SAMPLE --keep-order -j 8 '
    picard DownsampleSam \
    I=${SAMPLE}/raw.sam.gz \
    O=${SAMPLE}/{}downsampled.bam \
    A=1.0E-5 \
    P={}
  ' ::: 0.001 0.002 0.005 0.01 0.02 0.05 0.1 0.5
done

This command does what I expect, but it seems to be more time-consuming and computationally intensive.

Considering that hundreds of samples may be processed later, I wonder if there is a more optimized solution? For example, when doing P=0.5, is it possible to output the results for P=0.001, P=0.002, P=0.005, P=0.01, P=0.02, P=0.05, P=0.1, and P=0.2 together? That would save a lot of computing power and time if it were possible!

Thanks in advance!

picard downsampling RNA-seq BAM • 1.1k views
ADD COMMENT
0
Entering edit mode

May I ask what downsampling has to do with QC? As for the parallelization, if you want to further parallelize then wrap this snippet itself into parallel rather than a loop or submit an array of jobs if you are on a cluster that supports this, e.g. via SLURM.

ADD REPLY
0
Entering edit mode

I want to analyze the coverage of specially treated RNA by downsampling. When I do the for loop I used bsub to submit to multiple compute nodes of hpcc, but it is still a bit of a waste. I think downsampling should be able to get multiple scales of results at once.

ADD REPLY

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6