Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows
bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai
For some of the files that I am assessing, I don't get any errors and the output is obtained without issues. But sometimes the error I receive is as follows:
Error: The genome file Homo_sapiens_assembly38.fasta.fai has no valid entries. Exiting.
I have been looking for what could be the cause of the problem and I have seen that this is a quite common failure derived from the genome file structure, which in my case is the following:
chrI 15072421 101 112
While according to the bedtools documentation itself, the structure should be
chrI 15072421
chrII 15279323
...
chrX 17718854
chrM 13794
My question is, how is it possible that for some of the files I got an output but for some of them I get the error?
Thanks in advance!
You are using a version of bedtools prior to 2.29. More recent versions have changes in the way the
-g
file is read and more detailed error messages, so I'd suggest you try the current version to shed some light on this.Hi,
Please try:
Kevin
A bedtools genome file, as used with
-g
, is a tab-delimited table giving chromosome names and lengths, and the desired order of the chromosomes. Only the first two columns are used, so a .fai file is suitable. The FASTA file itself is not suitable.Indeed, Sir, it is not expected a FASTA