Subsetting VCF file - error in script?
1
0
Entering edit mode
2.4 years ago
X • 0

Hello, I have recently written the below script for extracting VCF samples based on a list of sample IDs in a separate text file (participant_ID.txt). However, I am running into a couple of errors.

  1. Nothing is coming out! The job keeps returning as failed in 1 second; is there a problem with the script? Am I not directing the output properly?

  2. I would like to be creating an error file (i.e. sample.err) that shows me where it went wrong, such as "Moduleloaderror". How could I do this, if I'm using Slurm workload manager? Right now, I have the first three lines of my script, but nothing is coming out of those either.

  3. Finally, I have noticed that other scripts tend to have "tee" after the {} part; but what is this function? Will it overwrite my original file? Is it necessary?

SBATCH --output=participant.out
SBATCH --error=participant.err
SBATCH --requeue

SCRIPT=${SCRATCH}/HLA

source ${SCRIPT}/myenv/bin/activate

module load NiaEnv/2019b && \
module load gcc/8.3.0 && \
module load bcftools\

TOTAL=/project/j/jle/Shared/joint-call/February2022/total.vcf.gz
sampleID=/scratch/j/jle/jasminl/HLA/participant_ID.txt
OUTPUTS=/scratch/j/jle/jasminl/HLA/Output  

for ${TOTAL}; do
    bcftools view -Oz -S ${sampleID} > ${OUTPUTS} samples.vcf.gz
done
}
VCF bash Slurm • 945 views
ADD COMMENT
0
Entering edit mode

for ${TOTAL}; do

what is that loop ? how are you reusing this loop ? you need to assign a variable.

why do you need to declare all those variables ?

furthermore not

bcftools view -Oz -S ${sampleID} > ${OUTPUTS} samples.vcf.gz

but

bcftools view -Oz -S/scratch/j/jle/jasminl/HLA/participant_ID.txt   > ${OUTPUTS}/samples.vcf.gz

I think you just want

bcftools view -Oz -S /project/j/jle/Shared/joint-call/February2022/total.vcf.gz    -o /scratch/j/jle/jasminl/HLA/Output/samples.vcf.gz  /project/j/jle/Shared/joint-call/February2022/total.vcf.gz
ADD REPLY
0
Entering edit mode

Hi!

{bcftools view -Oz -S /project/j/jle/Shared/joint-call/February2022/total.vcf.gz -o /scratch/j/jle/jasminl/HLA/Output/samples.vcf.gz}

This seems like the direction to go, but this will not extract just the samples that match the ID included in {sampleID}, right? Is there a way to fix this?

ADD REPLY
0
Entering edit mode
2.4 years ago
bcftools view -Oz -S ${sampleID} > ${OUTPUTS} samples.vcf.gz

Probably a couple of problems. The input file samples.vcf.gz is after the file redirection >. OUTPUTS seems a directory, rather than a file. .

ADD COMMENT
0
Entering edit mode

Hi, thank you for your response.

I was hoping samples.vcf.gz would be the name of the output file! samples.vcf.gz doesn't exist yet. I'm trying to use the text file sample.txt (so {sampleID}) to extract the samples in total.vcf.gz ({TOTAL}).

In this case, how would I proceed?

ADD REPLY
0
Entering edit mode

Then why is the input VCF not part of your command?

ADD REPLY
0
Entering edit mode

bcftools view -Oz -S ${TOTAL} ${sampleID} -o ${OUTPUTS}

Would this be a solution? TOTAL is the input VCF, sampleID the list of sampleIDs I want to use to extract from TOTAL, and OUTPUTS the directory of outputs.

ADD REPLY

Login before adding your answer.

Traffic: 1550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6