How to get a fake diploid sequence(each one from a population)
0
0
Entering edit mode
6.7 years ago

I want to use PSMC to analysis the division time of two closely human population. For that, I need to construct a pseudo-diploid sequence. Now, I have these two human populations' bam file. How should I do?

Could I directly use below command?

samtools mpileup -C50 -uf <ref.fa> <file1.bam> <file2.bam> | bcftools view -c - | vcftools vcf2fq | gzip > diploid.fq.gz

I think this command may be wrong. Another way is getting haploid sequence from each bam file, and then use seqtk mergefa to merge the two haploid sequences. But I do not know how to get haploid sequence from the bam file.

Could someone give me some suggestions, please?

SNP genome alignment sequence • 1.6k views
ADD COMMENT
0
Entering edit mode

From samtools mpileup manual:

Generate VCF, BCF or pileup for one or multiple BAM files. Alignment records are grouped by sample (SM) identifiers in @RG header lines. If sample identifiers are absent, each input file is regarded as one sample.

If your BAMs do not keep sample (individual) information as @RG tags, your samtools command probably won't produce the correct result.

ADD REPLY
0
Entering edit mode

Thank you.

My BAMs have the @RG tags, that's mean, if I input two BAMs one time, it can get the haploid sequence for each one?

ADD REPLY

Login before adding your answer.

Traffic: 1606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6