FreeBayes VCF output with FORMAT unknown
0
0
Entering edit mode
3.3 years ago
Michal Nevo ▴ 140

Hey, I am looking for a way to add samples ID names to the FORMAT in my vcf file.

I have 10 sorted Bam files. I used Freebayes to create vcf files and my next step is merging all 10 files for VcfSampleCompare. And for that I need to define groups that match the sample ID in the vcf file but here is one of my vcf file:

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each 
allele">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation 
count">
##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference 
observations">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate 
observations">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF output 
block.">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  **unknown** 
NC_048323.1     461     .       G       T       28.0886 .AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=3.0103;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=29;MQMR=0;NS=1;NUMALT=1;ODDS=6.46546;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=51;QR=0;RO=0;RPL=0;RPP=7.35324;RPPR=0;RPR=2;RUN=1;SAF=1;SAP=3.0103;SAR=1;SRF=0;SRP=0;SRR=0;TYPE=snp        GT:DP:AD:RO:QR:AO:QA:GL       1/1:2:0,2:0:0:2:51:-4.01203,-0.60206,0

And I want to fix it to be like that:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M1.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M3.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M5.sorted.bam  /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M7.sorted.bam      /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M9.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F1.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F2.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F4.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F6.sorted.bam     /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F8.sorted.bam
VCF FreeBayes unknown FORMAT • 2.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Interesting.. Did you mean

--use-header FILE

use the VCF header in the provided text FILE ?

ADD REPLY
0
Entering edit mode

I mean that you can look up is tool and use it to combine VCF files.

Best way to merge multiple VCF files

ADD REPLY
0
Entering edit mode

Merging is not my problem, I did use bcftools merge. My problem is that the samples ID is unknown:

enter image description here

ADD REPLY
0
Entering edit mode

Probably the sample id is already missing in the original bam or vcf file, you could check that. There are some tools to add the sampleid to those files (dont know them on top of my head)

Think at this stage this can be an quick solution: bcftools merge; retaining sample names . Found a one liner to replace **unknown**

ADD REPLY
0
Entering edit mode

my command:

sed '/^#CHROM/s/unknown//storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F1.sorted.bam/' five_contigs_cp.vcf > out.vcf

I get :

sed: -e expression #1, char 21: unknown option to `s'

ADD REPLY
0
Entering edit mode

using '\' before any '/' fixed it

ADD REPLY

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6