I am writing a code to analyse yeast sequencing data using R. I was able to do the alignment using BWA and thus obtain a .bam file and its .bai index using samtools. My next step is to perform variant calling using bcftools. Here is my command line and the associated error:
vcf_file <- "pathway/to/stock/variants.vcf"
system(paste("samtools mpileup -f", paste(fasta_dir, "chr*.fa", sep = ""), merged_bam_file,
"| bcftools call --ploidy 1 -mv >", vcf_file))
[mpileup] 17 samples in 17 input files samtools mpileup: error reading from input file Failed to read from standard input: unknown file type
paste(fasta_dir, "chr*.fa", sep = "")
gives the pathway to the directory containing the 17 fasta files of the genome of reference (one for each chromosome), named chrI.fa, chrII.fa, ... merged_bam_file is pathway to the alignment file I obtained and with which I want to perform the variant calling I understand that the problem comes from the type of my files, but since the alignment file is BAM and my genome of reference is FASTA, I have absolutely no clue what to change to make it work. I already checked the format of my FASTA file and it looks normal to me. Any advices?
Thank you for your help :)
Probably something wrong with the paths. You do not show the content of all variables. Anyway, I personally think that I is not a good choice to wrap such analysis. Write a bash script, Makefile, or use a workflow manager to wrap command line tools. This fidding inside R with system calls in bad practice imo. It essentially adds an unnecessary layer of complexity to your analysis.
I was able to find the cause of the problem! The results of the samtools mpileup cannot be used by bcftools call, instead, I used bcftools mpileup and it worked well. I leave this here in case other people are interested.
I know it is not the best to call system command in R, but it is the only programming language I use.. If I further have more problems with this, I'll try to learn how to use bash!