NtSeq is a JavaScript library for node.js and the browser that comes stocked with a large number of pre-built, highly-optimized methods for dealing with nucleotide sequence manipulation and analysis.
The goal of the library is primarily to assist both new and veteran bioinformaticians in creating complex software and web applications without having to reimplement basic nucleotide sequence manipulation methods.
As JavaScript is highly accessible (requiring no setup and only a text editor and web browser to execute), the library also provides an easy way to introduce young scientists to software development and bioinformatics.
With this library you can:
- Quickly scan genomic data for a target sequence and ungapped relatives using
.mapSequence()
- Grab the 5' -> 3' reverse complement of a sequence with
.complement()
- Manipulate sequences easily using
.replicate()
,.deletion()
,.insertion()
,.repeat()
and.polymerize()
- Translate your nucleotide sequences to their amino acid counterparts in a single line of code using
.translate()
or.translateFrame()
- Quickly determine AT% content with
.content()
or.fractionalContent()
- Grab approximate AT% content for degenerate sequences using
.contentATGC()
or.fractionalContentATGC()
- Load FASTA files into memory from your machine (node) with
.loadFASTA()
or from a string if you use an external AJAX request (web) using.readFASTA()
- Save large sequences for easy accession in the future using a new filetype,
.4bnt
that will cut your FASTA file sizes in half with.save4bnt()
and.load4bnt()
(node only)
It comes packaged with a degenerate nucleotide ungapped sequence alignment tool that scans a search sequence for a desired target sequence and returns a map of all possible alignments (from 100% identity to 0% identity) both unsorted (ordered by alignment offset) and sorted (ordered by identity). Though exhaustive and running on a CPU (and from JavaScript), the algorithm employed uses bit operations to perform close to 500,000,000 nucleotide comparisons per second (~2ns per nucleotide comparison) on a 2.4GHz processor, running the process in a single thread. This means that mapping a 100bp sequence to the E. coli genome (both strands) can be completed in ~2 seconds.
You can find the library at: http://keithwhor.github.io/NtSeq/
Cheers,
- Keith Horwood
What would be very helpful to novices is to also embed some functionality into a webpage so that one can perform simple sequence manipulations right away. As the SMS (Sequence Manipulation Suite) does: http://www.bioinformatics.org/sms2/index.html
Now the SMS's user interface is quite archaic - but now that I looked into it it turns out it was published in 2000! http://www.ncbi.nlm.nih.gov/pubmed/10868275
Great idea! I'll add that to the repository with the next release.
This is really nice. Thanks for posting about this!
Also Keith, if you an nice interface to your library we can integrate that into Biostar under a separate page. I often want to demonstrate students sequence based operations where command line is too unwieldily, the SMS is buggy etc. I'd love to have a place that is easy to get to and remember. In addition people might extend and contribute more if the system is readily usable by many,
Sure, I wouldn't be opposed. There's a sample HTML file located at web/index.html in the repository that shows four examples of using NtSeq, with a very basic interface. (Make sure you download web/ntseq.js and have it in the same directory as your index.) I have no problem spending some time making a more detailed interface, I would just have the preference to keep it somewhat simplistic so beginners can jump into it and figure things out without having to learn all of HTML5 + CSS + JS before playing around with NtSeq. :)
Hi @keithwhor. I'm a bioinformatics neophyte just learning about pipeline analysis, but engaged in building web tools for bioinformatics. Can you give me a quick understanding of in what ways I might be able to use this library in place of Picard or Samtools, etc.? For instance can I do alignments similar to BWA? Can it make sam or bam files? Can it mark duplicates? Sorry to be so naive. Thanks!