Spark NLP logo

Spark NLP

State of the Art Natural Language Processing
+ 1

What is Spark NLP?

It is a Natural Language Processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. It comes with 160+ pretrained pipelines and models in more than 20+ languages.
Spark NLP is a tool in the NLP / Sentiment Analysis category of a tech stack.
Spark NLP is an open source tool with 3.3K GitHub stars and 659 GitHub forks. Here’s a link to Spark NLP's open source repository on GitHub

Who uses Spark NLP?

5 companies reportedly use Spark NLP in their tech stacks, including Newzera, Ukuli Data, and Rabbitique.

20 developers on StackShare have stated that they use Spark NLP.

Spark NLP Integrations

Spark NLP's Features

  • Tokenization
  • Stop Words Removal
  • Normalizer
  • Stemmer
  • Lemmatizer
  • NGrams
  • Regex Matching
  • Text Matching
  • Chunking
  • Date Matcher
  • Part-of-speech tagging
  • Sentence Detector
  • Dependency parsing (Labeled/unlabled)
  • Sentiment Detection (ML models)
  • Spell Checker (ML and DL models)
  • Word Embeddings (GloVe and Word2Vec)
  • BERT Embeddings
  • ELMO Embeddings
  • Universal Sentence EncoderSentence Embeddings
  • Chunk Embeddings

Spark NLP Alternatives & Comparisons

What are some alternatives to Spark NLP?
It is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. It comes with pre-trained statistical models and word vectors, and currently supports tokenization for 49+ languages.
rasa NLU
rasa NLU (Natural Language Understanding) is a tool for intent classification and entity extraction. You can think of rasa NLU as a set of high level APIs for building your own language parser using existing NLP and ML libraries.
It provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
It is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Amazon Comprehend
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to discover insights from text. Amazon Comprehend provides Keyphrase Extraction, Sentiment Analysis, Entity Recognition, Topic Modeling, and Language Detection APIs so you can easily integrate natural language processing into your applications.
See all alternatives

Spark NLP's Followers
37 developers follow Spark NLP to keep up with related blogs and decisions.