Question

KRAKEN2 Sequence

0

Entering edit mode

2.7 years ago

chimerajit • 0

I ran KRAKEN2 using Viral database and a TranscriptAssembly assembled by rnaSpades. result is O.K. I found some targeted viruses.
Now I want those sequence which KRAKEN2 identified under those taxonimic group.

my KRAKEN2 output looks like this

C       NODE_24_length_27434_cov_34671.005410_g0_i23    1147722 27434   0:14927 196894:5 1147722:1 0:12164 28883:2 2545435:2 0:10 754059:4 0:285

I Understand \

1st column indicating Classified or Unclassified \ 2 then the Header of fasta \ 3 The Taxonomic ID \ 4 Length of that Fasta \
k-mers match \

Now I used the taxon ID information to find out the fasta headers and then I fetch out those Sequences from Assembly. However, If I use these sequences to do nBLAST it is not showing any similar result.

further, I understand that Kraken not used full sequence to identify that reported organism(k-mers). Then how to I get those identified sequences?

BLAST KRAKEN2 MEtagenome • 671 views

ADD COMMENT • link updated 2.7 years ago by GenoMax 147k • written 2.7 years ago by chimerajit • 0

score 0 · Answer 1 · 2022-04-01

Well Got one way to do it

1st take out those selected taxa id and related fasta header from KRAKEN2.Kraken output file\
Use https://github.com/santiagosnchez/faSomeRecords/blob/master/faSomeRecords.pl to extract those Sequences from your KRAKEN input files.(my case it is a fasta assembly)\
use that specific taxon ID to locate the genome from NCBI or similar Database\
Do a Nr Blast with Ref_Seq and the Taxa-Specific Sequence. use outfmt 6 So you will get Start End info of specific hit\
make a bed file using BLAST output\
use bedtools getfasta option with the bed file and your Kraken extracted Seqence file. You will get exact stretch of sequence. \

Let me know if any easy way around.