I am getting a curious error when using bedtools to extract sequences from a reference fasta and a gff3 file.
I enter:
/gpfs/gwngs/tools/bedtools2/bin/bedtools getfasta -s -fullHeader -fi ref.fa -bed ref.gff3
And I get a million lines saying, for example:
WARNING. chromosome (CP014594) was not found in the FASTA file. Skipping.
The thing is I grepped the fasta header for this chromosome and got:
>CP014594
There are no white spaces in the line. what's going on? i have tried inserting the prefix 'chr' into both files, and inserting a space between > and the chromosome name for the fasta file.
what is the output of
?
00000000 54 47 47 54 47 47 41 54 47 43 54 47 0a 3e 43 50 |TGGTGGATGCTG.>CP|
00000010 30 31 34 35 39 34 0a 41 43 41 47 43 43 47 41 43 |014594.ACAGCCGAC|
00000020 41 41 43 43 43 41 41 43 41 54 47 43 43 41 41 41 |AACCCAACATGCCAAA|
00000030 43 54 43 43 41 47 41 43 54 43 47 41 41 43 43 54 |CTCCAGACTCGAACCT|
00000040 47 47 47 41 43 54 43 43 41 41 47 41 41 54 43 41 |GGGACTCCAAGAATCA|
00000050 41 41 43 0a |AAC.|
00000054
and
00000000 43 50 30 31 34 35 39 34 09 2e 09 67 65 6e 65 09 |CP014594...gene.|
00000010 31 30 34 36 31 09 31 34 36 35 38 09 2e 09 2b 09 |10461.14658...+.|
00000020 2e 09 4e 61 6d 65 3d 41 54 59 34 30 5f 72 52 4e |..Name=ATY40_rRN|
00000030 41 63 53 43 37 73 31 30 34 36 31 65 31 34 36 35 |AcSC7s10461e1465|
00000040 38 3b 6c 6f 63 75 73 5f 74 61 67 3d 22 41 54 59 |8;locus_tag="ATY|
00000050 34 30 5f 72 52 4e 41 63 53 43 37 73 31 30 34 36 |40_rRNAcSC7s1046|
00000060 31 65 31 34 36 35 38 22 0a |1e14658".|
00000069