Diffbot logo

Diffbot

A robot that sees the web the way people do, and helps developers extract the important parts from any web page.
15
30
+ 1
0

What is Diffbot?

Our APIs use computer vision, machine learning and natural language processing to help developers extract and understand objects from any Web page. We've determined that the entire Web can be classified into approximately 18 structural page types. From this basic understanding of common page layouts, Diffbot then uses computer vision, natural language processing and other machine learning algorithms to identify and extract the important items from within these pages.
Diffbot is a tool in the Article API category of a tech stack.

Who uses Diffbot?

Companies
3 companies reportedly use Diffbot in their tech stacks, including Buzzvil, LTK, and Whale.

Developers
12 developers on StackShare have stated that they use Diffbot.

Diffbot Integrations

Diffbot's Features

  • The Article API is used to extract clean article text from news article web pages.
  • The Follow API allows you to subscribe to the changes of any web page.
  • The Frontpage API takes in a multifaceted “homepage” and returns individual page elements.
  • [Limited Alpha] The Page Classifier API takes any web link and automatically determines what type of page it is.
  • Accurate- We utilize state-of-the art computer vision and NLP algorithms
  • have the largest collection of tagged pages and update our model several times per week.
  • Easy- Pass in a URL and we'll do the rest. Stop spending time building custom scrapers and -- even worse -- maintaining them.
  • Stable- Diffbot is built and run by Web veterans in a multi-tiered environment with redundancy, monitoring and scalability built-in. Our scale lets us operate the service more cheaply than running it yourself.
  • Open- We use open standards (schema.org) and allow for endless configurability via our customization tool.

Diffbot Alternatives & Comparisons

What are some alternatives to Diffbot?
import.io
import.io is a free web-based platform that puts the power of the machine readable web in your hands. Using our tools you can create an API or crawl an entire website in a fraction of the time of traditional methods, no coding required.
Octoparse
It is a free client-side Windows web scraping software that turns unstructured or semi-structured data from websites into structured data sets, no coding necessary. Extracted data can be exported as API, CSV, Excel or exported into a database.
JavaScript
JavaScript is most known as the scripting language for Web pages, but used in many non-browser environments as well such as node.js or Apache CouchDB. It is a prototype-based, multi-paradigm scripting language that is dynamic,and supports object-oriented, imperative, and functional programming styles.
Git
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
GitHub
GitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Over three million people use GitHub to build amazing things together.
See all alternatives

Diffbot's Followers
30 developers follow Diffbot to keep up with related blogs and decisions.