Indexing reference genome for GATK
2
0
Entering edit mode
9.8 years ago
lcc1844 ▴ 40

I have created .bam files using BWA and Samtools having downloaded a reference genome hg19 from UCSC and indexing it with

bwa index -a bwtsw hg19.fa

This worked fine for me, until now when I am trying to call variants in GATK and see that I need a reference in karyotypic order. I have downloaded the hg19.fa.fai and .dict files from the GATK FTP site and have also downloaded their full reference genome which is in chromosome order. I read that it wad easier to do this than to try and order the indexed reference I already have.

Only I am now trying to index the new reference in the same way and get the message:

[bwa_index] fail to open file '‐a bwtsw' : No such file or directory

Any ideas what I should be doing?!

Thanks

alignment next-gen-sequencing • 6.6k views
ADD COMMENT
0
Entering edit mode

What version of bwa and what is the exact command that causes the error message?

ADD REPLY
0
Entering edit mode

Hi it is bwa-0.7.12 and the command was

bwa index -a bwtsw hg19.fa
ADD REPLY
0
Entering edit mode

Are you sure you didn't type:

bwa index "-a bwtsw" hg19.fa

or something like that? This isn't an error that I can reproduce without using an incorrect command.

ADD REPLY
0
Entering edit mode

100% sure, I used the same command I have used in the past so I don't know why it wont work. In the BWA directory I see bwtsw files with various suffixes. At the moment I am running index alone (without bwtsw) but I am not familiar with how this works for the human genome.

ADD REPLY
0
Entering edit mode

Oh ok brilliant, thank you. The index command alone is processing nicely so I wont question it any more!

ADD REPLY
3
Entering edit mode
9.8 years ago
matted 7.8k

It looks like you may have copy-pasted the command from somewhere with a strange character encoding, or something like that. Your dash character isn't the same as the "regular" one:

>>> ord(u"‐")  # I copied yours here
8208
>>> ord(u"-")   # this is the one on my keyboard
45

If I copy your bwa command, I get:

bwa index ‐a bwtsw test.fa
[bwa_index] fail to open file '‐a' : No such file or directory

As Devon suggested, I need to group bwtsw to get the exact error you did. Maybe your space character is strange and effectively did that? That's a guess, though.

bwa index "‐a bwtsw" test.fa
[bwa_index] fail to open file '‐a bwtsw' : No such file or directory

In any case, you no longer need to specify -a with bwa, as it will auto-detect the best algorithm (the default is auto). Therefore, what you're running now should work fine regardless.

Edit: It looks like overall it's a Unicode problem. Your hyphen is a Unicode hyphen (8208 is 0x2010, which is a regular hyphen). It's not the same as an ASCII hyphen (ASCII 45, which is 0x002d). I imagine the -a bwtsw problem is because it used a Unicode space instead of the plain ASCII one. I don't know much about character encoding (and am glad!) and how desktop systems manage it when copying text between applications, but it looks like that caused these issues.

ADD COMMENT
2
Entering edit mode

Wow, good catch on the dash character.

ADD REPLY
0
Entering edit mode

This is the reason I always use TextWrangler, check encoding and try again if I'm copy pasting commands.

ADD REPLY
0
Entering edit mode
7.6 years ago
Ray Lee • 0

Change the order

bwa index hg.fa -a bwtsw

This works for me.

ADD COMMENT

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6