Question

Merge Bam With Vcf Files. Tips About The Correct Workflow

0

Entering edit mode

12.4 years ago

francesco.montinaro • 0

Dear All, I am very new to the analysis of NGS data.

I would like to merge the information of sample 1029 from HGDP (http://cdna.eva.mpg.de/denisova/VCF/human/HGDP01029.hg19_1000g.12.mod.vcf.gz) to SAN sample in Schuster et al 2010 ftp://ftp.bx.psu.edu/data/bushman/hg18/bam/KB1illumChr12.bam.

If I well understood, I should call the variants from the bam file and then merge with the vcf. Is it correct?

Could you gently suggest me the best way to do it in your opinion? When should i convert my files to the same reference sequence?

I am really sorry if I am saying something completely wrong, but I've just started to manage this kind of data

merge bam vcf • 3.9k views

ADD COMMENT • link updated 12.4 years ago by Marvin ▴ 900 • written 12.4 years ago by francesco.montinaro • 0

0

Entering edit mode

I'm not sure what are you planning to do, but denisova and bushman SNPs for hg18/hg19 are in Kaviar: http://db.systemsbiology.net/kaviar/cgi-pub/Kaviar2.pl?show=sources

ADD REPLY • link 12.4 years ago by JC 13k

0

Entering edit mode

Thank you for your help, but I don't need the Denisova genome but the HDP1029 (mandenka genome) and the San genome. Unfortunately data are in two different formats and I don't know how to compare them. Do you have any suggestion?

ADD REPLY • link 12.4 years ago by francesco.montinaro • 0

0

Entering edit mode

well, you have 2 options: 1) map San genome reads to hg19 and the call variants with samtools or GATK, or 2) call variants using the hg18 BAM and then convert the coordinates to hg19 with UCSC LiftOver. After that you can merge with VCFtools

ADD REPLY • link 12.4 years ago by JC 13k

0

Entering edit mode

Thank you, I will try today to do it, hoping that it works. I will do the call variants using samtools mpileup, but I have no idea how to map the San genome, Do you have any suggestion?

Thank you again.

ADD REPLY • link 12.4 years ago by francesco.montinaro • 0

score 1 · Answer 1 · 2012-11-17

1

Entering edit mode

12.4 years ago

Marvin ▴ 900

You should probably align the Schuster data to hg19 (the MPG guy used the same genome as the 1000 Genomes Project), probably using BWA. Then call genotypes, using either "samtools mpileup" or GATK. Then merge the files, maybe using "vcftools merge". In my experience, vcftools is so buggy as to be useless, so you'll probably write your own mergin code for the two VCF files.

ADD COMMENT • link 12.4 years ago by Marvin ▴ 900

0

Entering edit mode

But,If I well understood BWA accept only fastq input and my Schuster data is .bam. Is it correct? I am really sorry for that, my i am feeling "alone in the dark"... I absolutely need to attend a course for NGS.

Thank you again

ADD REPLY • link 12.4 years ago by francesco.montinaro • 0

0

Entering edit mode

The man page for BWA is here: http://bio-bwa.sourceforge.net/bwa.shtml

ADD REPLY • link 12.4 years ago by Marvin ▴ 900