How to build a reference genome given an excel file of sequences
0
0
Entering edit mode
3.0 years ago
bibrgr • 0

Hello,

I have sequencing data from a library prep that I've removed the adapter sequences from, leaving a set of ~20 nt barcodes generated from the reads (about 2e6). I have a file listing all of the possible barcodes (~5000, they are also ~20 nt) and would like to know the number of times each barcode appears among the reads.

I know this is probably a more powerful tool than I need, but I was thinking of using Bowtie 2 to bin the reads, using a reference genome where each barcode is its own chromosome. However, I'm not sure how to convert the table of all the barcodes into the correct format to feed into Bowtie 2.

Any help would be much appreciated. Thanks!

reference genome library barcodes bowtie2 • 632 views
ADD COMMENT
0
Entering edit mode

If your reads and barcodes are both 20 bp and you are looking for exact matches then simply using UNIX grep using full word match mode (or ideally seqkit grep (LINK) may be the easiest option.

ADD REPLY

Login before adding your answer.

Traffic: 1794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6