Dear All, I am very new to the analysis of NGS data.
I would like to merge the information of sample 1029 from HGDP (http://cdna.eva.mpg.de/denisova/VCF/human/HGDP01029.hg19_1000g.12.mod.vcf.gz) to SAN sample in Schuster et al 2010 ftp://ftp.bx.psu.edu/data/bushman/hg18/bam/KB1illumChr12.bam.
If I well understood, I should call the variants from the bam file and then merge with the vcf. Is it correct?
Could you gently suggest me the best way to do it in your opinion? When should i convert my files to the same reference sequence?
I am really sorry if I am saying something completely wrong, but I've just started to manage this kind of data
I'm not sure what are you planning to do, but denisova and bushman SNPs for hg18/hg19 are in Kaviar: http://db.systemsbiology.net/kaviar/cgi-pub/Kaviar2.pl?show=sources
Thank you for your help, but I don't need the Denisova genome but the HDP1029 (mandenka genome) and the San genome. Unfortunately data are in two different formats and I don't know how to compare them. Do you have any suggestion?
well, you have 2 options: 1) map San genome reads to hg19 and the call variants with samtools or GATK, or 2) call variants using the hg18 BAM and then convert the coordinates to hg19 with UCSC LiftOver. After that you can merge with VCFtools
Thank you, I will try today to do it, hoping that it works. I will do the call variants using samtools mpileup, but I have no idea how to map the San genome, Do you have any suggestion?
Thank you again.