Entering edit mode
6.8 years ago
pierre.justeau
▴
10
Hi All,
The purpose of my question is to generate a new header for a bam files. For one sample, i defined two command lines and its works:
#filter header
samtools view -H file1.bam | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ' > file1_filtered.txt
#re-header
cat file1_filtered.txt <(samtools view file1.bam) | samtools view -hb - > file1_filtered_re-header.bam
I would like to do the same thing for a lot of samples
#filter header
ls *.bam | parallel -j 8 'samtools view -H -@ 7 {} > {.}_filtered.txt'
After this step i didn't find the solution, if someone has an idea? Many thanks in advance.
Keep in mind that working with files in parallel may or may not be faster than doing each independently. If the processing is IO-bound, then running in parallel may not be faster (though not likely slower).
See also: https://oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-faster/
Curious if you have done any recent testing with new high performance file systems such as isilon/NetApp (versions with spinning disks and pure SSD's)? At some point one of the connections (pci-e bus) may become the bottleneck.
I have not, but I am pretty sure the conclusion will be the same: It depends, and you should therefore test with different parallelization and measure.
Hi,
if it works with the command lines, why not putting these into a bash script and running this bash script with parallel?
Otherwise, you may use a tab-separated file where you have input and output variables stored like here.
Cheers, Michael
thanks for your help. I solve the issues for my first step "#filtered". I didn't find a solution to run the step "re-header" for a multiple files...
Many thanks for your help, Pierre
Hi all, Thanks a lot for your help and script, it's run perfectly. I better understand where was my mistake. I just discovered GNU parallel, I'm a bit confused with some syntaxes, i'm gonna to learn more about it.
Thanks again, Pierre
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted (green check mark).