Entering edit mode
3.3 years ago
Michal Nevo
▴
140
Hey, I am looking for a way to add samples ID names to the FORMAT in my vcf file.
I have 10 sorted Bam files. I used Freebayes to create vcf files and my next step is merging all 10 files for VcfSampleCompare. And for that I need to define groups that match the sample ID in the vcf file but here is one of my vcf file:
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each
allele">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation
count">
##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference
observations">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate
observations">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF output
block.">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT **unknown**
NC_048323.1 461 . G T 28.0886 .AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=3.0103;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=29;MQMR=0;NS=1;NUMALT=1;ODDS=6.46546;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=51;QR=0;RO=0;RPL=0;RPP=7.35324;RPPR=0;RPR=2;RUN=1;SAF=1;SAP=3.0103;SAR=1;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:AD:RO:QR:AO:QA:GL 1/1:2:0,2:0:0:2:51:-4.01203,-0.60206,0
And I want to fix it to be like that:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M1.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M3.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M5.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M7.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M9.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F1.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F2.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F4.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F6.sorted.bam /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F8.sorted.bam
bcftools merge http://samtools.github.io/bcftools/bcftools.html#merge (?)
Interesting.. Did you mean
--use-header FILE
use the VCF header in the provided text FILE ?
I mean that you can look up is tool and use it to combine VCF files.
Best way to merge multiple VCF files
Merging is not my problem, I did use bcftools merge. My problem is that the samples ID is unknown:
Probably the sample id is already missing in the original bam or vcf file, you could check that. There are some tools to add the sampleid to those files (dont know them on top of my head)
Think at this stage this can be an quick solution: bcftools merge; retaining sample names . Found a one liner to replace
**unknown**
my command:
I get :
sed: -e expression #1, char 21: unknown option to `s'
using '\' before any '/' fixed it