Entering edit mode
5.3 years ago
marcus.hooker
•
0
I called SNPs with bcftools mpileup and I had 120 input sample files, my command looked like this,
bcftools mpileup -d 8000 -f Reference.fasta -Ob Input.bam Input2.bam ...... Input120.bam
But in my bcf file, it doesn't have the sample IDs for each sample, it just shows one sample called "sample_ID."
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample_ID
1A 20677 . AGGG AGG 5.04449 . INDEL;IDV=1;IMF=1;DP=1;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=37 GT:PL 1/1:33,3,0
1A 20719 . G C 8.13869 . DP=1;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=37 GT:PL 1/1:37,3,0
1A 20732 . G T 8.13869 . DP=1;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=37 GT:PL 1/1:37,3,0
What went wrong? How should I fix this problem? I was expecting a column for each sample so that after FORMAT it said Input Input2 Input3 .... Input120.
The sample name is taken from the bam header and not from the file. I guess you will have the same name in all files.
Could you please show the output of
samtools view -H input.bam
for two files?I see. The sample names in the bam header just say "sample_ID" is there a way to get them to have the name of the actual sample? Can I just provide bcftools mpileup a list of sample names as well?
Not that I'm aware. The much cleaner way is to fix the sample names in every bam file e.g. with
samtools reheader
.If you have a list with filename+sample_name and show us how the complete header currently looks like I might help you with this.