I can't merge my BCF
files together using bcftools
. Below are the details of my pipeline. After running the pipeline, I created a subdirectory that has 2 *.bcf
files to try and merge them as a test set but it's not working.
My commands to merge 2 *.bcf files
# Directory contents
-bash-4.1$ cd bcf_files/testing/
-bash-4.1$ ls
S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
# Attempting to merge 2 bcf files
-bash-4.1$ bcftools view S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf > testing.merged.bcf
#Error below
Index required, expected .vcf.gz or .bcf file: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
Failed to open or the file not indexed: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
I tried indexing them
$ bcftools index S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
[E::main_vcfindex] bcf_index_build failed for S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
My pipeline: I have 88 samples whose reads together total to about 746 G in size.
I used HISAT2
for the mapping using human assembly hg38. HISAT2
supplies preindexed files that we used located at ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38.tar.gz
The assembly for the genome used for the indexing was retrieved from ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
Create the sam file
hisat2 -q -p 2 --fast -x ./grch38/genome -1 {r1_path} -2 {r2_path} -S ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam
Sam => Sorted-bam
samtools view -bS ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam | samtools sort -@ 16 -o ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted
Sorted-bam => BCF
samtools mpileup -uf ./grch38/Homo_sapiens.GRCh38.dna.primary_assembly.fa -C 50 --BCF -o ./bcf_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted
$ du -sh *
5.8T bcf_files
8.5G grch38
4.7G grch38.tar.gz
435K reads
34K run_tmp.sh
176G sam_files
38G sorted_bam_files