Step 1
You should first get your BCFs in a file as a listing, like this:
find /home/drowl1/Project1/ -name "*.bcf" > BCF.list
cat BCF.list
/home/drowl1/Project1/Sam1.bcf
/home/drowl1/Project1/Sam2.bcf
/home/drowl1/Project1/Sam3.bcf
/home/drowl1/Project1/Sam4.bcf
/home/drowl1/Project1/Sam5.bcf
...
You can also use ls
but here I use find
Step 2
Then, create a loop that will go through each file in the listing and perform the analysis with BCFtools:
paste BCF.list | while read BCF ;
do
echo "--Processing ""${BCF}""..."
#Create results filename
resultsfile=$(echo "${BCF}" | sed 's/.bcf$/.txt/g') ;
echo -e "--Results file will be ""${resultsfile}\n" ;
bcftools isec [options] Baseline.bcf "${BCF}" --output "${resultsfile}" ;
done ;
You can probably follow what's going on here. The results will be output to a file with the same name for each BCF, but with the .txt extension. You can change this (with the inner sed
command) to anything that you want.
Also be careful because, when you list the BCFs, it may include your baseline reference (in my example above, I've named this Baseline.bcf
This should work in bash and sh. I have not tested other shells.
Kevin
It depends on how your data is organized. Are all your files on the same folder? Are they scattered throughout several subfolders under a common folder?
If all your files are on a single folder, the simplest would be a bash loop (I am using a mock command as I don't know the command you are executing):
In case your files are scattered, you may have to use
find
with-exec
. When you feel more confident (and if you have the computing resources) you can move to GNU parallel.Thank you.The for loop didn't work for me with this particular bcftools isec option but i learned about find with -exec options!