Hi,
I analyze several samples every day for variant analysis using align to reference method. For this purpose I use different Bioinformatics software such as Bowtie2/BWA, Samtools, and Freebayes. Is there a way in which I can know which version of software was used to process a particular sample. This should work like an audit trail, informing say Sample1 was aligned using bowtie2 vX.X.X, Sample2 was analysed using Bowtie2 vX.X.Y, and so on.
For example
bowtie2 --version
command gives the output of Bowtie2 installed on the system as follows:
/usr/local/bin/bowtie2-align-s version 2.2.2
64-bit
another approach,
samtools view -H sample.sorted.bam
@HD VN:1.0 SO:coordinate
@SQ SN: reference LN:
@PG ID:bowtie2 PN:bowtie2
VN:2.2.2
CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 -x -I 0 -X 1000 --fr -p 16 --local --passthrough -1 /tmp/40466.inpipe1 -2 /tmp/40466.inpipe2"
Both these commands do not tell that Sample1 was processed using bowtie2, Sample2 was processed using Bowtie2 and so on.
I would like to get an audit trail, where I will know for each software which version was used to process which sample.
Thanks!!
You could capture this information (
bowtie2 --version
) in your analysis master logs for projects. Unix commandscript
can capture all interactive dialog from a terminal sessions. Standard error and standard output logs captured from the analysis should include this information and can be saved.You could also use a workflow system like
snakemake
to capture/automate your interactions and log those actions.Indeed, for audit trails in corporate and clinical settings, I produce a log for each sample that looks something like:
Versions of the programs that are used are stored elsewhere, and there is also a standard operating procedure, which is versioned and has date for next review.