At Josh.ai, we’re often asked for developer resources relating to natural language processing, machine learning, and artificial intelligence. Paul Dixon, a researcher living in Kyoto Japan, put together a curated list of excellent speech and natural language processing tools. Below is the list current as of Oct 1, 2015. Check out the GitHub repo for more here.
Finite State Toolkits and Regular Expressions
- AT&T FSM Library The AT&T FSM libraryTM is a set of general-purpose software tools available for Unix, for building, combining, optimizing, and searching weighted finite-state acceptors and transducers.
- Carmel Finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests/
- Categorial semiring Categorial semiring as described in Sproat et al. 2014
- dk.brics.automaton Java toolkit for FSAs and regular expression.
- Fare Fare is a finite state and regular expression libary for the .NET framework written in C#. am is a JavaScript library for working with automata and formal grammars for regular and context-free languages
- Foma Finite-state compiler and C library
- fsa Toolkit used in RWTH ASR engine
- fsm2.0 Thomas Hanneforths fsm 2.0 library written C++ has a few nice operations such as three-way composition
- fstrain A toolkit for training finite-state models
- jopenfst Java port of the C++ OpenFst library; originally forked from the CMU Sphinx project
- Kleene programming language High level finite state programming language built on top of OpenFst.
- MIT FST Toolkit WFST toolkit no maintained anymore but feature a few commands not found in other toolkits
- MoMs-for-StochasticLanguages Spectral and other training algorithms for WFSAs.
- n Shortest Path for PDT n Shortest Path for PDT
- Noam "Noam is a JavaScript library for working with automata and formal grammars for regular and context-free languages". Also has pretty cool examples using viz.js
- OpenFst OpenFst is a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs).
- openfst-utils Nice set of utilities for OpenFst includes implementation of Categorial semirings.openfst-utils.
- openlat Toolkit for manipulating word lattice built on top of OpenFst. Includes support for reading and writing HTK compatible lattices.
- PyFst Python interface to OpenFst
- SFST - Stuttgart Finite State Transducer Tools "SFST is a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology."
- Treba "Treba is a basic command-line tool for training, decoding, and calculating with weighted (probabilistic) finite state automata (PFSA) and Hidden Markov Models (HMMs)."
Many of the tools in the machine translation section also implement interesting graph and semiring operations.