Extract VCF samples by ID - Help
1
0
Entering edit mode
2.4 years ago
X • 0

Hi,

I have a vcf.gz file which contains about 1000 samples, but I want to extract about 400 of them based on sample ID. I have a separate csv file containing the sample IDs - my idea is to create a for loop that goes through the sample ID column of the vcd and matches it with the IDs from the csv file.

How can I do this so that I create a new VCF file containing only the sample ID matches? Thank you for your help - I am a beginner in bioinformatics so this may be a very basic task.

VCF python bcftools bash • 1.5k views
ADD COMMENT
1
0
Entering edit mode

Thank you very much. I am trying to bcftools method, but am a bit confused on how it works. What does it mean by "one sample per line" when doing":

bcftools view -S sample.txt

ADD REPLY
1
Entering edit mode

What does it mean by "one sample per line" when doing"

each line in sample.txt contains one sample name.

ADD REPLY
0
Entering edit mode

Thank you! Just an additional question - this is what I came up with, but it seems to run into errors. Do you know if I'm lacking something in my script?

set -x

{SCRIPT=${SCRATCH}/HLA

source ${SCRIPT}/myenv/bin/activate

module load NiaEnv/2019b && \ module load gcc/8.3.0 && \ module load bcftools\

TOTAL=/project/j/jle/Shared/joint-call/February2022/total.vcf.gz sampleID=/scratch/j/jle/jasminl/HLA/participant_ID.txt OUTPUTS=/scratch/j/jle/jasminl/HLA/Output

for ${TOTAL}; do bcftools view -Oz -S ${sampleID} > ${OUTPUTS} sample.vcf.gz done }

ADD REPLY
1
Entering edit mode

this is unrelated to your original question and "it it seems to run into errors"

ADD REPLY
0
Entering edit mode

Oh! I see, Ill make a separate post for it. Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6