Kmer counts per contig

0

Entering edit mode

4.9 years ago

GLR ▴ 20

Hello,

I need to extract kmers and their counts per contig in an assembly file and I was wondering what would be the most efficient way to do this?

For previous full genome kmer counts I've used BBTools kmercountexact.sh and I have considered ways to fed each scaffold into that program, but I have two issues with that potential solution. The first is the sheer number of output files that would result from doing that, although I guess I could just cat them all at the end. The second is I am very unfamiliar with awk/ bioawk and so while I know bioawk allows you to extract sequences very easily I don't know how to set up a for loop using awk/bioawk to do this and then pipe the contigs into another program.

Would anyone be kind enough to help me with this or direct me to a more appropriate solution?

Thank you!

Assembly • 1.1k views

ADD COMMENT • link 4.9 years ago by GLR ▴ 20

1

Entering edit mode

You mean split the multifasta file into individual contigs? See here: Splitting A Fasta File

ADD REPLY • link 4.9 years ago by Asaf 10k

0

Entering edit mode

Hi Asaf,

Not really. I want to pipe each contig into kmer counting software. I could split them into multiple files, and feed them in individually I suppose but I'm going to imagine that has a high I/O cost that isnt overly efficient although it would certainly achieve what I need I suppose.

ADD REPLY • link 4.9 years ago by GLR ▴ 20

Login before adding your answer.