What is Gensim?
It is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Gensim is a tool in the NLP / Sentiment Analysis category of a tech stack.
Gensim is an open source tool with GitHub stars and GitHub forks. Here’s a link to Gensim's open source repository on GitHub
Who uses Gensim?
Companies
12 companies reportedly use Gensim in their tech stacks, including Avito, Kinderboerderij het Gouden Gansje, and Data Science, Data Analytics, Machine Learning.
Developers
60 developers on StackShare have stated that they use Gensim.
Decisions about Gensim
Here are some stack decisions, common use cases and reviews by companies and developers who chose Gensim in their tech stack.
Biswajit Pathak
Project Manager at Sony · | 6 upvotes · 854.4K views
Can you please advise which one to choose FastText Or Gensim, in terms of:
- Operability with ML Ops tools such as MLflow, Kubeflow, etc.
- Performance
- Customization of Intermediate steps
- FastText and Gensim both have the same underlying libraries
- Use cases each one tries to solve
- Unsupervised Vs Supervised dimensions
- Ease of Use.
Please mention any other points that I may have missed here.
Gensim's Features
- platform independent
- converters & I/O formats
Gensim Alternatives & Comparisons
What are some alternatives to Gensim?
NLTK
It is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.
Keras
Deep Learning library for Python. Convnets, recurrent neural networks, and more. Runs on TensorFlow or Theano. https://keras.io/
FastText
It is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
SpaCy
It is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. It comes with pre-trained statistical models and word vectors, and currently supports tokenization for 49+ languages.
TensorFlow
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.