Golang libraries for data science

wowThe Golang programming language is on a par with Python when it comes to ease of use, but the code compiles to a binary that runs almost as fast as C. So Golang is worth considering for any task that crunches large volumes of data. This is a curated list of Golang libraries useful in data science and related fields.

The term ‘data science’ is used here in the broadest sense to cover areas as diverse as:

  • natural language processing.
  • machine learning.
  • advanced mathematics including statistics, probability, algebra and calculus.
  • data extraction, cleaning, normalisation, conversion, reformatting and storage.
  • data visualisation.

Natural Language Processing

  • go-freeling: A partial port of Freeling 3.1. Implements text tokenization, sentence splitting, morphological analysis, suffix treatment, retokenization of clitic pronouns, flexible multiword recognition, contraction splitting, probabilistic prediction of unknown word categories, named entity detection, PoS tagging, chart-based shallow parsing, named entity classification, and rule-based dependency parsing.
  • enca: Minimal Cgo bindings for libenca.
  • go-nlp: A few structures for doing NLP analysis / experiments.
  • go-eco: Similarity, dissimilarity and distance matrices; diversity, equitability and inequality measures; species richness estimators; coenocline models.
  • golibstemmer: Go bindings for libstemmer.
  • go-ngram: In-memory n-gram index with compression.
  • go-porterstemmer: A native Go clean room implementation of the Porter Stemming algorithm.
  • go-stem: Go implementation of the Porter stemming algorithm.
  • gounidecode: Unicode transliterator (also known as unidecode) for Go.
  • guesslanguage: Functions to determine the natural language of a unicode text.
  • icu: Cgo binding for icu4c C library detection and conversion functions.
  • libtextcat: Cgo binding for libtextcat C library.
  • MMSEGO: Go implementation of MMSEG (a Chinese word splitting algorithm).
  • paicehusk: Implementation of the Paice/Husk Stemmer.
  • porter: Another Go port of the Porter stemmer.
  • porter2: Really fast Porter 2 stemmer..
  • segment: A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
  • snowball: Cgo wrapper for the snowball stemmer.
  • snowball: Native Go snowball stemmers for English, Spanish, French and Russian.
  • snowball: Snowball stemmer for Go (cgo).
  • stemmer: English and German stemmers in native Go.
  • textcat: A Go package for n-gram based text categorization, with support for utf-8 and raw text..

Machine Learning

  • bayesian: Bayesian classifier.
  • CloudForest: Ensembles of decision trees in go/golang.
  • gobrain: Neural Networks written in go.
  • godist: Various probability distributions, and associated methods.
  • go-fann: Go bindings for the Fast Artificial Neural Networks (FANN) library.
  • go-galib: Genetic Algorithms library written in Go.
  • golearn: Machine Learning for Go.
  • golinear: liblinear bindings for Go.
  • go-ml: Linear / Logistic regression, Neural Networks, Collaborative Filtering and Gaussian Multivariate Distribution.
  • go-pr: A gaussian classifier pattern recognition package.
  • goRecommend: Recommendation Algorithms library written in Go.
  • libsvm: libSVM implementation in Go.
  • mlgo: Various “minimalistic” machine learning algorithms.
  • neural-go: Implements a simple multilayer perceptron network.
  • probab: Probability distribution functions – Bayesian inference.
  • regommend: Recommendation and collaborative filtering engine.
  • shield: Bayesian text classifier with flexible tokeniser and backend store support.

Data Analysis/Visualisation

  • blas – Implementation of BLAS (Basic Linear Algebra Subprograms).
  • gocomplex – A complex number library for the Go programming language.
  • go-fn – Mathematical functions written in Go language, that are not covered by math pkg.
  • go-graph: Graph library for Go/golang language.
  • go-gt – Graph theory algorithms.
  • go.matrix – linear algebra for go (has been stalled).
  • gonum/mat64 – The general purpose package for matrix computation. Package mat64 provides basic linear algebra operations for float64 matrices..
  • gonum/plot – gonum/plot provides an API for building and drawing plots in Go.
  • goraph – A pure Go graph theory library(data structure, algorithm visualization).
  • gostat – A statistics library for the go language.
  • streamtools – general purpose, graphical tool for dealing with streams of data.
  • SVGo: The Go Language library for SVG generation.
  • vectormath – Vectormath for Go, an adaptation of the scalar C functions from Sony’s Vector Math library, as found in the Bullet-2.79 source code (currently inactive).

Text Indexing

  • bleve: A modern text indexing library for go.
  • fulltext: Pure Go full text indexer and search library.
  • golucene: Go port of Apache Lucene.
  • golucy: Go bindings for the Apache Lucy full text search library.

Data Extraction and Processing

  • curl: Standalone Curl library for Go (libcurl not required) .
  • facebook: A Facebook Graph API SDK Library For Go.
  • fetchbot: A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
  • gocrawl: Polite, slim and concurrent web crawler.
  • go-curl: Go libcurl bindings.
  • golang-curl: Go bindings for libcurl.
  • goquery: Similar to JQuery.
  • go-pkg-rss: This package reads RSS and Atom feeds and provides a caching mechanism that adheres to the feed specs.
  • go-pkg-xmlx: Extension to the standard Go XML package. Maintains a node tree that allows forward/backwards browsing and exposes some simple single/multi-node search functions.
  • oauth: OAuth 1.0 implementation in Go.
  • oauth2: OAuth 2.0 implementation (official Go package).
  • purell: Go library to normalize URLs.
  • twitterstream: Twitter streaming API for Go.
  • twty: Command-line Twitter client.
  • go-webkit2: WebKit API bindings (WebKitGTK+ v2) for Go. Permits headless operation of WebKit.
  • webloop: Scriptable, headless WebKit with a Go API. Like PhantomJS, but for Go. Render static HTML versions of dynamic JavaScript applications, automate browsing, run arbitrary JavaScript in a browser window context, etc, all from Go or the command line.

Cloud Infrastructure APIs

  • aws-sdk-go: AWS SDK for the Go programming language.
  • godropbox: Common Dropbox libraries for writing Go services/applications.
  • elastigo: Go based Elasticsearch client library.
  • etcd: A highly-available key value store for shared configuration and service discovery.
  • gcloud-golang: Google Cloud APIs Go Client Library.
  • goamz: Golang Amazon Library .
  • gocloud: Collection of golang libraries for cloud APIs.
  • godo: DigitalOcean Go API client .
  • gohadoop: Native Go clients for Apache Hadoop YARN.
  • go-nsq: Official Go package for NSQ realtime distributed messaging platform.