In 2000, Carnegie Mellon University's School of Computer Science Sphinx group released a collection of open-source speech recognition development libraries and tools that, over time, came to be known as CMUSphinx. Late last month, the group celebrated 1.5 million downloads of this toolkit, which is used for speech recognition research and building speech products.
Begun as a DARPA-funded project, Sphinx is committed to widely releasing its software to stimulate the creation of speech-using tools and applications, and to advance the state of the art in both speech recognition and related research areas, including dialog systems and speech synthesis.
Originally, the heart of the open-source toolkit was Sphinx2, a real-time, large vocabulary, speaker-independent speech recognition system. Its openly available acoustic models included American English and French in full bandwidth and telephone-bandwidth models. The technology was a suitable candidate for handheld and embedded devices, and interactive telephone and desktop systems that relied on short response times. When its successor, Sphinx3, arrived on the scene, it was used primarily for high-accuracy, non-real-time speech recognition.
The most recent iteration of the technology, Sphinx4, is under active development and offers a complete rewrite of the Sphinx engine in Java, with the goal of providing a more flexible framework for research in speech recognition. Development goals include creating a new acoustic model trainer, implementing speaker adaptation, improving configuration management and creating a graph-based user interface for graphical system design.
Other resources available for download include Pocketsphinx, a recognizer library for real-time systems written in C; Sphinxtrain, used to create acoustic models; and Sphinxbase, an integrated library that supports both Pocketsphinx and Sphinxtrain.
"CMUSphinx uniquely has provided free, unencumbered access to speech technology for the past 16 years. People in more than 210 countries have downloaded it in just the past year. Can you just imagine all the amazing things it's let them do?" said Alex Rudnicky, research professor in the Language Technologies Institute.