Question

BAM dataset to Genotype data conversion using PLINK

0

Entering edit mode

3.0 years ago

BENYEBUGA • 0

What commands do I enter using PLINK on Ubuntu to call and convert this hg19 BAM data set (66 samples) into a Genotype data format (txt, csv...) ?

Data: https://www.ebi.ac.uk/ena/browser/view/PRJEB42975?show=reads

Pardon if I've worded the question awkwardly, I'm rather new at this. I can provide more detailed context if needed -- any help would be much appreciated!

bam and fastq.gz files from the link are also downloaded on my hard drive; I'm running on ubuntu with a Win10 OS.

Paper: https://www.biorxiv.org/content/10.1101/2021.02.17.431423v1

BAM Genotype hg19 PLINK • 1.6k views

ADD COMMENT • link 3.0 years ago by BENYEBUGA • 0

score 1 · Answer 1 · 2021-11-27

1

Entering edit mode

3.0 years ago

4galaxy77 2.9k

You must use a genotype caller in order to obtain genotypes from a .bam file. It's not possible to 'convert' .bam to genotypes.

There's a lot of options, but maybe using bcftools is the most simple. Take a read of this pipeline.

bcftools mpileup -Ou -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>

ADD COMMENT • link 3.0 years ago by 4galaxy77 2.9k

0

Entering edit mode

I'm getting the following error when executing:

[E::faidx_adjust_position] The sequence "1" was not found

Is this an issue with the header descriptions being misread (spaces, comments)? If so, what command can I use to correct the error downstream with bcftools?

I've read online this command might correct the issue but don't know where to place it within the pipeline command " trimreaddescriptions=t "

Thanks again for the pointers,

For reference, this is the command I'm using with the hg19.fa reference:

bcftools mpileup -Ou -f hg19.fa I19139.hg19.bam | bcftools call -vmO z -o Nubians.vcf.gz

ADD REPLY • link 3.0 years ago by BENYEBUGA • 0