Entering edit mode
5.4 years ago
nuketbilgen
▴
40
Hi everyone,
I have vcf files of 4 feline genomes, but in vcf header I see different contig names. I checked the reference genome file line, you can see it below.
reference=file:///ifswh1/BC_COM_P1/F18FTSEUHT0898/CATsxlR/analysis/index/GCF_000181335.3_Felis_catus_9.0_genomic.fa
reference=file:///ifshk5/BC_AS/BC_COM_P0/F19FTSEUHT0354/CATbelR/2016/result/index/felCat9.fa
Two of my genomes aligned to the first one, the other two aligned to the second one. I want to merge this vcfs and run an LD analysis but I can not.
How can I solve this? Thanks...
Are they the same genome builds?
A quick Google-search yielded:
felCat9.fa
(UCSC Genome Browser) andGCF_000181335.3_Felis_catus_9.0_genomic.fa
(NCBI)exactly yes. When I split vcf files into chr by SnpSift split command, I got 40 files for felcat9.fa aligned files, and I got 426 files for NCBI one. I worry to lose important variants...
I think the biostar community needs more information to your post to help you, such as how the VCF files were produced. If the only difference is in naming, then a quick regular expression or search and replace command can replace the column 1 value from an old, undesired name to a new, desired name.
Note that this above command assumes that
oldname
only occurs in the column1 of the VCF file.Hi again, vcf files generated by GATK haplotypecaller walker. Haplotype Calling java -jar GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T HaplotypeCaller -R all.chrs.con.fa -L TEST_Chr01 -I aligned_reads.sorted.dedup.bam --emitRefConfidence GVCF --variant_index_type LINEAR -- variant_index_parameter 128000 -o TEST_Chr01.gvcf
You can find the examples of the contig lines below. These contigs also have variations, and if file has variation on "contig=ID=chrA1_NW_019365239v1_random,length=46965>" same variation is located on "contig=<id=chra1_random,length=415283>" for the other two files. So the chr naming on the same positioned SNPs are different as well...
First two files contig example;
contig=ID=chrA1,length=242100913>
contig=ID=chrA1_random,length=415283>
contig=ID=chrA2,length=171471747>
contig=ID=chrA2_random,length=1187422>
Other two files contig example;
contig=ID=chrA1,length=242100913>
contig=ID=chrA1_NW_019365239v1_random,length=46965>
contig=ID=chrA1_NW_019365240v1_random,length=58068>
contig=ID=chrA1_NW_019365241v1_random,length=50743>
contig=ID=chrA1_NW_019365242v1_random,length=22574>
contig=ID=chrA1_NW_019365243v1_random,length=50951>
contig=ID=chrA1_NW_019365244v1_random,length=50765>
contig=ID=chrA1_NW_019365245v1_random,length=14920>
contig=ID=chrA1_NW_019365246v1_random,length=45003>
contig=ID=chrA1_NW_019365247v1_random,length=40320>
contig=ID=chrA1_NW_019365248v1_random,length=25974>
contig=ID=chrA2,length=171471747> . . .
I know its a long shot, but would you suggest that I merge the files according to their chrs? like this?