Question

Add a genome to a kraken library

0

Entering edit mode

23 months ago

Francois Piumi ▴ 70

Hi, if I understood well, a kraken library only contains human, bacterial and viruses taxonomy.

I noticed that it was possible to add another genome, as follows:

kraken-build --add-to-library chr1.fa --db $DBNAME

So I downloaded a genome, and write the following line:

kraken2-build --add-to-library Culicoides_sonorensis.Cson1.dna_rm.toplevel.fa --db Kraken2_Standard_Fev2019

Here the output error message:

scan_fasta_file.pl: unable to determine taxonomy ID for sequence scaffold40

Indeed, there isn't any taxonomy information in the fasta file (header example :

>scaffold40 dna:supercontig supercontig:Cson1:scaffold40:1:766034:1 REF)

So, how Kraken does to retrieve a taxonomy information from a fasta file? Is there a specific fasta format to download?

kraken • 1.7k views

ADD COMMENT • link 23 months ago by Francois Piumi ▴ 70

score 0 · Answer 1 · 2023-05-20

0

Entering edit mode

23 months ago

shenwei356 8.7k

Check the manual.

>sequence16|kraken:taxid|32630  Adapter sequence
CAAGCAGAAGACGGCATACGAGATCTTCGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA

ADD COMMENT • link 23 months ago by shenwei356 8.7k

0

Entering edit mode

I transformed all my sequences ids according to the manual. Krakenbuild accepted them ("Culicoides_sonorensis.fa" was added to the kraken library "Kraken2_Standard_Fev2019").

But there isn't any trace of "Culicoides_sonorensis" in the report after analysis of a fastq file of Culicoides RNA-Seq sequences....

It is not exactly clear if we must add a description after the "sequence16|kraken:taxid|32630" from the manual

And it is also not clear if all sequences must be added one by one like in the manual (chr1.fa, chr2.fa)

ADD REPLY • link 23 months ago by Francois Piumi ▴ 70