Generating vcf files per individual instead of one big file from bam files
1
0
Entering edit mode
2.3 years ago
salman_96 ▴ 70

Hi,

I have multiple bam files that I used to generate vcf file for variant calling. I used the command below

bcftools mpileup -O v -f $REF  bam/*.bam > genotypes.vcf

Instead of getting one file each sample, I am getting one big file that contains information on all samples.

Is there any way to either split them into samples or re-run the command to get one file per sample?

Regards

vcf bam variant • 1.9k views
ADD COMMENT
2
Entering edit mode
bcftools mpileup -O v -f $REF  bam/sample1.bam > sample1.vcf
bcftools mpileup -O v -f $REF  bam/sample2.bam > sample2.vcf

use a loop or better, a workflow manager.

'bcftools call' should also be invoked after mpileup.

ADD REPLY
0
Entering edit mode

Yes, I did write only the first command that generates genotype vcf files. Unfortunately, there are multiple files and I cannot find any way to loop through

ADD REPLY
0
Entering edit mode

and I cannot find any way to loop through

what have you tried ?

ADD REPLY
0
Entering edit mode

I tried a one liner to avoid large genotype file. I get no results'

for bam in $(ls *.bam) ; do bcftools mpileup -O v -f GRCh38.primary_assembly.genome.fa $bam | bcftools call --ploidy 1 -vm -O v > VARIANTS.vcf

ADD REPLY
0
Entering edit mode

and there is no error message ? the vcf are all empty ? no header ?

ADD REPLY
0
Entering edit mode

Hi,

Yes no error message. Just the command returns ">"

ADD REPLY
0
Entering edit mode

No vcf files generated

ADD REPLY
0
Entering edit mode

because a for loop should end with done

https://www.cyberciti.biz/faq/bash-for-loop/

ADD REPLY
0
Entering edit mode

Yes, The following command did work when used done but it is generating same named file for every sample. So at the end the process resulted in only one file which was the last one in the samples because of loop.

ADD REPLY
0
Entering edit mode

The bam files are named somehow like this

CL03_T1STARpairedAligned.sortedByCoord.out.bam CL05_T1STARpairedAligned.sortedByCoord.out.bam CL05_T1_2DSTARpairedAligned.sortedByCoord.out.bam CL09T1_2DSTARpairedAligned.sortedByCoord.out.bam

ADD REPLY
0
Entering edit mode

This is my script

/bin/bash

cd /The_Variant_call_project/bam

for filename in *.bam

    do
    echo "working with file $filename"

base=$(basename $filename .bam) echo "base name is $base"

for bam in $(ls *.bam) ; do bcftools mpileup -O v -f GRCh38.primary_assembly.genome.fa $base | bcftools call --ploidy 1 -vm -O v > $final_variants

done

The error I run into is Failed to open file "CL48T3STARpairedAligned.sortedByCoord.out" : No such file or directory No I know that the file name should end with .bam but here I get CL48T3STARpairedAligned.sortedByCoord.out instead of CL48T3STARpairedAligned.sortedByCoord.out.bam.

I would appreciate if you would add any suggestion. I do believe there is some problem with the base name.

Regards

ADD REPLY
0
Entering edit mode
2.3 years ago
salman_96 ▴ 70

Hi I was able to solve the problem. Had to make some changes in the last command

bcftools mpileup -O v -f GRCh38.primary_assembly.genome.fa ${infile} | bcftools call --ploidy GRCh38 -vm -O v > ${final_variants}

Thank you for the help. I appreciate it

ADD COMMENT

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6