Question

Extract VCF samples by ID - Help

0

Entering edit mode

2.4 years ago

X • 0

Hi,

I have a vcf.gz file which contains about 1000 samples, but I want to extract about 400 of them based on sample ID. I have a separate csv file containing the sample IDs - my idea is to create a for loop that goes through the sample ID column of the vcd and matches it with the IDs from the csv file.

How can I do this so that I create a new VCF file containing only the sample ID matches? Thank you for your help - I am a beginner in bioinformatics so this may be a very basic task.

VCF python bcftools bash • 1.5k views

ADD COMMENT • link 2.4 years ago by X • 0

score 1 · Answer 1 · 2022-06-22

1

Entering edit mode

2.4 years ago

Pierre Lindenbaum 164k

Extract subset of samples from multigenome vcf file ; How to include/keep only the samples in a list in VCF.gz file?

ADD COMMENT • link 2.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you very much. I am trying to bcftools method, but am a bit confused on how it works. What does it mean by "one sample per line" when doing":

bcftools view -S sample.txt

ADD REPLY • link 2.4 years ago by X • 0

1

Entering edit mode

What does it mean by "one sample per line" when doing"

each line in sample.txt contains one sample name.

ADD REPLY • link 2.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you! Just an additional question - this is what I came up with, but it seems to run into errors. Do you know if I'm lacking something in my script?

set -x

{SCRIPT=${SCRATCH}/HLA

source ${SCRIPT}/myenv/bin/activate

module load NiaEnv/2019b && \ module load gcc/8.3.0 && \ module load bcftools\

TOTAL=/project/j/jle/Shared/joint-call/February2022/total.vcf.gz sampleID=/scratch/j/jle/jasminl/HLA/participant_ID.txt OUTPUTS=/scratch/j/jle/jasminl/HLA/Output

for ${TOTAL}; do bcftools view -Oz -S ${sampleID} > ${OUTPUTS} sample.vcf.gz done }

ADD REPLY • link 2.4 years ago by X • 0

1

Entering edit mode

this is unrelated to your original question and "it it seems to run into errors"