How to build a reference genome given an excel file of sequences

0

Entering edit mode

3.5 years ago

bibrgr • 0

Hello,

I have sequencing data from a library prep that I've removed the adapter sequences from, leaving a set of ~20 nt barcodes generated from the reads (about 2e6). I have a file listing all of the possible barcodes (~5000, they are also ~20 nt) and would like to know the number of times each barcode appears among the reads.

I know this is probably a more powerful tool than I need, but I was thinking of using Bowtie 2 to bin the reads, using a reference genome where each barcode is its own chromosome. However, I'm not sure how to convert the table of all the barcodes into the correct format to feed into Bowtie 2.

Any help would be much appreciated. Thanks!

reference genome library barcodes bowtie2 • 780 views

ADD COMMENT • link updated 3.5 years ago by GenoMax 151k • written 3.5 years ago by bibrgr • 0

0

Entering edit mode

If your reads and barcodes are both 20 bp and you are looking for exact matches then simply using UNIX grep using full word match mode (or ideally seqkit grep (LINK) may be the easiest option.

ADD REPLY • link 3.5 years ago by GenoMax 151k

Login before adding your answer.