extracting Genus abundance info from Kraken 2 output file
0
0
Entering edit mode
3.3 years ago
arshad1292 ▴ 110

Hello

I have a kraken 2 output (k2report.txt file that looks like this:

enter image description here

What I want to do is to extract only number of reads for example for Genus i.e. G with taxid 2316020, 2719313, 207244 and 572511 and so on. I am not interested in D, O or R etc.

I have a large file with many hundred Genuses. Does anyone has any shell/python script that I could use to extract only Genus abundance (number of reads) for sample1, sample2 and sample3?

I would really appreciate your help.

Many thanks,

kraken2 • 1.4k views
ADD COMMENT
0
Entering edit mode

Do you mean you want to subset your matrix where lvl_type == "G"?

If yes then you can use grep "\tG\t" input-file.

ADD REPLY
0
Entering edit mode

Yes that's correct that I want a subset of matrix that contains only "G".

I tried your script but it produced nothing...

ADD REPLY
1
Entering edit mode

Actually, I assumed tab (\t) as the field separator.

If it is the filed separator and is still not working, you should probably add -P to the grep command. Something like this.

grep -P "\tG\t" input-file-name
ADD REPLY
0
Entering edit mode

Ok this one works. Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6