OpenRefine logo

OpenRefine

Desktop application for data cleanup and transformation
5
0
+ 1
0

What is OpenRefine?

It is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
OpenRefine is a tool in the Big Data Tools category of a tech stack.
OpenRefine is an open source tool with 6.7K GitHub stars and 1.2K GitHub forks. Here’s a link to OpenRefine's open source repository on GitHub

Who uses OpenRefine?

Companies
5 companies reportedly use OpenRefine in their tech stacks, including Courtsdesk, Fastlix, and infoculture.

Developers

OpenRefine Integrations

Why developers like OpenRefine?

Here’s a list of reasons why companies and developers use OpenRefine
Top Reasons
Be the first to leave a pro

OpenRefine's Features

  • Faceting
  • Clustering
  • Editing cells
  • Reconciling
  • Extending web services

OpenRefine Alternatives & Comparisons

What are some alternatives to OpenRefine?
Trifacta
It is an Intelligent Platform that Interoperates with Your Data Investments. It sits between the data storage and processing environments and the visualization, statistical or machine learning tools used downstream
R
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.
Python
Python is a general purpose programming language created by Guido Van Rossum. Python is most praised for its elegant syntax and readable code, if you are just beginning your programming career python suits you best.
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
See all alternatives