human_variation_vcf
0
0
Entering edit mode
6 weeks ago
runfreely2 • 0

Hi.

I'm pre-processing a bam file according to this script https://github.com/Yonsei-TGIL/Mosaic-Reference-Standards/blob/master/1.A.pipe_Align_Preprocess.sh.

The authours of this paper (https://www.nature.com/articles/s41597-022-01133-8) shared for the samples lines the bam files preprocessed. For example SetB_M3_2.preprocessed.bam. While for others samples, they provided only the fastq file. So I am generating and preprocessing the bam file for these files.

I'm doing the base recalibration step, but I'm running into issues. I need to specify the --known-sites $dbSNP, for which I downloaded the https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/, 00-All.vcf.gz This vcf only contains the main contigs, so I thought of filtering the input bam file and reference genome for the main chromosomes as well.

But the bam files of the authors contain also the additional contings. I want to run paired sample variant calling using SetB_M3_2.preprocessed.bam as tumor sample and the bam I'm generating and proprocessing as control. To do this the two bam files should have the same contigs.

I'd really appreciate any help in this.

Thank you in advance.

BaseRecalibrator • 359 views
ADD COMMENT
0
Entering edit mode

Are you sure that the chromosome notation is the same? Which reference genome was used by authors?

ADD REPLY
0
Entering edit mode

This is from their bam file: @PG ID:bwa-12AC583 PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 4 -M /data/resource/reference/human/NCBI/GRCh38_GATK/BWAIndex/genome.fa /data/project/RefStand/1.raw_Mosaic/MergeAll/FP-M1-1_R1.fq.gz /data/project/RefStand/1.raw_Mosaic/MergeAll/FP-M1-1_R2.fq.gz

I'm using GRCh38_full_analysis_set_plus_decoy_hla.fa: names chromosomes as chr1, chr2, ..., etc.

I had noticed that 00-All.vcf.gz named chromosomes as 1, 2, ..., etc. I did rename it.

ADD REPLY
1
Entering edit mode

Wouldn't be better to use an already converted dbsnp file from the same genome? You could get it from the Broad resource boundle https://console.cloud.google.com/storage/browser/_details/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.gz;tab=live_object

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6