Building a Data Exploration Tool with React, Redux, Victory, and Elasticsearch

6,272
Stitch Fix
Stitch Fix is a personal styling platform that delivers curated and personalized apparel and accessory items of perfect fit for women.

By Patrick Sun, Software Engineer at Stitch Fix. This post originally appeared on Stitch Fix's Tech Blog.


Introduction

As a frontend engineer on the Algorithms & Analytics team at Stitch Fix, I work with data scientists to develop applications and visualizations to help our internal business partners make data-driven decisions. I envisioned a platform that would assist data scientists in the data exploration process, allowing them to visually explore and rapidly iterate through their assumptions, then share their insights with others. This would align with our team's philosophy of having engineers "deploy platforms, services, abstractions, and frameworks that allow the data scientists to conceive of, develop, and deploy their ideas with autonomy", and solve the pain of data exploration.

The final product, code-named Dora, is built with React, Redux and Victory, backed by Elasticsearch to enable fast and iterative data exploration, and uses Spark to move data from our S3 data warehouse into the Elasticsearch cluster.


Dora Screenshot


Why not just use Kibana?

The first question I got from our data platform engineers was, "Why not just use Kibana?". Elasticsearch's built-in visualization tool, Kibana, is robust and the appropriate tool in many cases. However, it is geared specifically towards log exploration and time-series data, and we felt that its steep learning curve would impede adoption rate among data scientists accustomed to writing SQL. The solution was to create something that would replicate some of Kibana's essential functionality while hiding Elasticsearch's complexity behind SQL-esque labels and terminology ("table" instead of "index", "group by" instead of "sub-aggregation") in the UI.

Elasticsearch’s API is really well-suited for aggregating time-series data, indexing arbitrary data without defining a schema, and creating dashboards. For the purpose of a data exploration backend, Elasticsearch fits the bill really well. Users can send an HTTP request with aggregations and sub-aggregations to an index with millions of documents and get a response within seconds, thus allowing them to rapidly iterate through their data.

Moving Data From S3 to Elasticsearch with Spark

To load data from our S3 data warehouse into the Elasticsearch cluster, I developed a Spark application that uses PySpark to extract data from S3, partition, then batch-send each partition to Elasticsearch to increase parallelism. The Spark job enables fielddata: true for text columns with low cardinality to allow sub-aggregations by text columns and prevents data duplication by adding a unique _id field to each row in the dataframe.



# Use SparkSQL to extract data from a table in S3.
df = spark_context.sql("select * from {}.{}".format(schema, table))

# Partition the dataframe and batch send each partition.
df.repartition(num_of_partitions).foreachPartition(send_batch)


The job can then be run by data scientists in Flotilla, an internal data platform tool for running jobs on Amazon's ECS, with environment variables specifying which schema and table to load.

React & Redux


Dora Architecture


In the UI, Redux is used for client-side state management. The two most important pieces of state are configuration and data. The configuration state stores the configuration for each graph, each keyed by an UUID and consisting of graph property key/value pairs. When certain properties update, an Elasticsearch query is built from the configuration and sent to the Elasticsearch index's /_search endpoint. The ensuing response is then parsed and stored in the data state.


// Redux Store
{
  configuration: {
    [graphUUID]: {
      graphTitle: 'My Graph',
      esIndex: 'my_schema.my_table',
      aggregations: [
        { type: avg, column: 'some_column' }
      ],
      // Other graph properties.
    }
  },
  data: {
    [graphUUID]: [
      {
        aggregationMetric: {...},
        groupBy: {...},
        data: [{x: 0, y: 100}, ...]
      }
    ]
  },
  // Elasticsearch indices metadata.
  indices,
  // Global UI state.
  globals
}


Diving deeper into the various graph properties that make up the configuration, we can classify each property as shallow (a singular value, such as which Elasticsearch index to query) or nested (an array of values, such as the list of aggregations to plot). Additionally, certain properties affect the query being sent to Elasticsearch while others only affect the presentation. These graph properties are represented graphically as child components of the <QueryContainer> parent component. To provide a consistent interface for the variety of graph property components to interact with the Redux store, they are decorated with the queryConnect higher-order component, which exposes an onChange prop for shallow properties and an onAdd, onEdit, and onDelete prop for nested properties.


// queryConnect.js
export default function queryConnect(type, mapStateToProps) {
  return (UnwrappedComponent) => {
    const WrappedComponent = class extends Component {
      handleChange({ uuid, properties }) {
        this.props.dispatch(editShallowGraphProperties({
          uuid,
          properties
        }))
      }
      render() {
        return (
          <UnwrappedComponent onChange={this.handleChange} />
        )
      }
    }
    // Component is connected to the Redux store here.
    return connect(mapStateToProps)(WrappedComponent)
  }
}

// SomeQueryComponent.js
class SomeQueryComponent extends Component {
  onChange(value) {
    const { uuid, onChange } = this.props
    this.props.onChange({
      uuid: this.props.uuid,
      properties: {
        xAxisMetric: value
      }
    })
  }
  // Other methods.
}

export default queryConnect('shallow', mapStateToProps)(SomeQueryComponent)


Once a query-modifying graph property has been updated, the <GraphContainer> component will dispatch the fetchData action, sending an HTTP request to Elasticsearch and rendering the response into Victory's visualization components. While we have historically used React and D3 for data visualization, we opted to use Victory for several reasons. It does a great job of integrating React and D3 (something notoriously tedious to do), provides a developer-friendly API in the form of React components, and is highly extensible. Thus, synchronizing React and D3 is delegated to Victory and we can focus on building custom React SVG components on top of it.


// Custom SVGs to complement Victory base components.
import CustomHover from './CustomHover'
import CustomAnnotation from './CustomAnnotation'

class VisualizationContainer extends Component {
  // Other methods.
  renderVisualizations(datum, index) {
    const sharedProps = {...}
    switch(this.props.graphType) {
      case 'BAR':
        return React.cloneElement(
          this.renderVictoryLine(args),
          sharedProps
        )
      // Handle other graph types.
    }
  }
  render() {
    return (
      <VictoryChart
        // Use the VictoryVoronoiContainer component to handle
        // mousemove events.
        containerComponent={
          <VictoryVoronoiContainer
            dimension="x"
            labelComponent={<CustomHover />}
          />
        }
      >
        {/*
          Render other components, such as axes and annotations.
        */}
        {this.props.data.map((d, i) => this.renderVisualizations(d, i))}
      </VictoryChart>
    )
  }
}


Summary

Dora helps data scientists at Stitch Fix visually explore their data - such as forecasting client demand, looking at the distribution of clients in a particular segment, and viewing inventory levels. Powered by React and Elasticsearch, it provides an intuitive UI for data scientists to take advantage of Elasticsearch's powerful functionality. Additionally, it aligns with our team's goal of building horizontal platforms to enable data science. Moving forward, I'm excited to continue building out this platform and other data visualization tools at Stitch Fix!

Come Work With Us

We’re a diverse team dedicated to building great products, and we’d love your help. Do you want to build amazing products with amazing peers? Join us! Check Out Our Open Engineer Roles

Stitch Fix
Stitch Fix is a personal styling platform that delivers curated and personalized apparel and accessory items of perfect fit for women.
Tools mentioned in article
Open jobs at Stitch Fix
Big Data Platform Infrastructure Engi...
San Francisco, CA

About the Team

At Stitch Fix, our data science team leverages machines together with expert-human judgment to generate innovative recommendations and insights that innovate the way the company functions and helps our clients look and be their best selves.  As a member of the data infrastructure team, you’ll help build the data processing engines that innovate the future of e-commerce.

  • Our infrastructure is 100% deployed on AWS
  • We make heavy use of Spark to process and transform data
  • We use Presto and Redshift for ad-hoc queries and analysis
  • The source of truth for our data warehouse is AWS S3
  • We run 1000s of batch jobs each night, training 100s of models that feed our   recommendation engines and other data-driven APIs

About the Role

  • You’ll help us tune our Spark, Redshift and Presto deployments to function well under load and in AWS.  You’ll create services and tools to help make the experience better for our data scientists.
  • We run custom builds of both Presto and Spark, so you will contribute to our Spark/Presto customization efforts, builds, and deployment.
  • You’ll help us investigate various file formats (e.g. Parquet), and help create loaders and storers that function well on S3 and with our metadata service.
  • You’ll contribute to the development of our logging infrastructure based on Apache Kafka.
  • You’ll build services to ingest data into our warehouse and ensure it’s clean and consistent.
  • Many of the changes we need would also benefit others in the big data community.  You’ll have the opportunity to contribute back.

We’re excited about you because…

  • You have a strong distributed systems background.
  • You have 5 or more years of project experience with significant contributions.
  • You have exceptional coding and design skills, particularly in Java/Scala.
  • You work autonomously and take ownership of projects.
  • You understand how big data infrastructure works in the public cloud.
  • You are naturally curious and get excited to dig in and understand how things work.

Why you'll love working at Stitch Fix...

  • We are a group of bright, kind and goal oriented people. You can be your authentic self here, and are empowered to encourage others to do the same!
  • We are a successful, fast-growing company at the forefront of tech and fashion, redefining retail for the next generation
  • We are a technologically and data-driven business
  • We are committed to our clients and connected through our vision of “Transforming the way people find what they love”
  • We love solving problems, thinking creatively and trying new things
  • We believe in autonomy & taking initiative
  • We are challenged, developed and have meaningful impact
  • We take what we do seriously. We don’t take ourselves seriously
  • We have a smart, experienced leadership team that wants to do it right & is open to new ideas
  • We offer competitive compensation packages and comprehensive health benefits
  • You will be proud to say that you work for Stitch Fix and will know that the work you do brings joy to our clients every day

About Stitch Fix

Stitch Fix is an online personal style service for men and women combining art and science to disrupt and redefine the retail industry. We’re the first fashion retailer to blend expert styling, proprietary technology, and unique product to deliver a refined and deeply personalized shopping experience. We leverage vast amounts of client data to make decisions throughout the company. All of this results in a simple, powerful offering to our customers and a successful, growing business. We believe we are only scratching the surface of our opportunity, and we’re looking for incredible people to contribute! We’d love for you to help us carry on the trend.

Director of Engineering - New Client ...
San Francisco

 

MultiThreaded Engineering, UX and Design at Stitch Fix

At Stitch Fix, our goal is to help our customers look great and feel great about themselves by revolutionizing how people shop. In a time-starved world where shopping often feels overwhelming, our business connects customers to clothes they love. Whether it’s helping someone dress for success at a new job or taking the stress out of packing for a family vacation, we fix clients’ closets – and they love us for it!

We’ve built unique, innovative software for merchandising, warehouse and inventory management, remote styling, and logistics. We leverage vast amounts of client data to make decisions throughout the company. All of this results in a simple, powerful offering to our clients and a very successful business. We believe we are only scratching the surface of our opportunity, and we’re looking for incredible people to contribute!

Director of Engineering

About the Team

Stitch Fix is an online personal styling startup in San Francisco, combining art and science to disrupt and redefine the retail industry. Our technology team builds the applications that run our business and create an exceptional and personalized experience for our clients, from our customer-facing website and mobile app to unique and innovative tools that power our warehouse, merchandising, and styling teams. We leverage vast amounts of client data to make decisions throughout the company. All of this results in a simple, powerful offering to our customers and a successful, growing business. We strive to ensure that our overall service offering is the best that it can possibly be, and we pride ourselves on creating a fun and collaborative environment where we solve problems together. We believe we are only scratching the surface of our opportunity, and we’re looking for smart, kind, goal-oriented technology leaders who are relentlessly client-focused to join our team.

About the Role

Director of Engineering - New Client Team

You’ll lead the engineering team responsible for all aspects of the client experience through acquisition, conversion, and onboarding. Your teams will work on stitchfix.com and our iOS app to convert visitors to registered users and registered users to happy clients.

You’ll:

  • Own your organization design and hiring plan that will best tackle our current and future opportunities
  • Partner with marketing, product managers, UX, and data scientists to define strategic initiatives and product vision, drive your roadmap, and help millions of people dress better and feel better about themselves.
  • Ensure that smart investments are being made to enable long-term effectiveness of our products, people, and processes
  • Help the people on your team grow into better leaders and engineers.
  • Enable your organization to scale by setting direction and goals then empowering your team to make decisions.
  • Influence company-wide technology decisions as a member of our engineering leadership team.

About You

You are an engineering leader with experience:

  • Leading hiqh quality engineering teams of more than 10 people that have more than doubled in size.
  • Finding and hiring great people.
  • Building trust and influencing a diverse group of business partners and engineers.
  • Leading teams that built consumer products and drove meaningful changes in customer behavior.

If you also have, great! (but it’s not a requirement:)

  • Hands-On Ruby on Rails Experience
  • Experience leading iOS or Android teams
  • Experience leading growth engineering teams

We are excited about you because:

  • You’re Kind: We approach our coworkers with humility, respect, and trust. You are someone who prefers to do the same.
  • You Build Teams: You know great software is impossible without great people. You know how to hire and develop great people.
  • You’re Pragmatic: You're not religious about any particular tools or methodology. You've worked with many, and you probably have strong opinions, but you adapt to what works best for each team.
  • You’re Data Driven: You don’t like making decisions based on opinions. Instead, you focus decision making on what we know and what we can learn.
  • You’re Inspiring: Your team understands why their work matters and is excited about it.

About the Technology

At Stitch Fix, our engineers use technology to solve business problems. In this role, the tech stack your team will use includes:

 

  • Ruby on Rails
  • Swift
  • Postgres
  • Redis
  • Memcache
  • RabbitMQ
  • AWS
  • Heroku

 

We also practice continuous deployment. Teams deploy multiple changes to production in a typical workday.

 

About the Company

Stitch Fix is an online personal styling startup in San Francisco, combining art and science to disrupt and redefine the retail industry. Our engineering team builds the tools to run the business, from customer-facing website and mobile app to unique and innovative tools that power our warehouse, merchandising, and styling teams. We leverage vast amounts of client data to make decisions throughout the company. All of this results in a simple, powerful offering to our customers and a successful, growing business. We believe we are only scratching the surface of our opportunity, and we’re looking for incredible people to contribute!

Why you'll love working here…

  • We hire bright, kind, and goal oriented people. We love what we do at Stitch Fix but know that it is not the only thing in our lives. Our culture and benefits are designed to support our employees so that they can be their best selves both at work and away from work.
  • We offer competitive salaries, equity and comprehensive health benefits
  • We are a startup with a proven business model that’s growing fast
  • We are a technologically and data-driven business
  • We offer great time away programs
  • We are blending tech and fashion, redefining shopping for the next generation
  • We are passionate about our clients and live/breathe the client experience
  • You’ll get to be creative on a daily basis
  • We believe in autonomy & taking initiative
  • You’ll report to a leadership team that wants to do it right and values innovation
  • We have a fully stocked pantry with your choice of snacks and drinks
  • You’ll work in light-filled, modern offices in downtown San Francisco, CA

 

 

 

 

Director of Engineering - Returning ...
San Francisco

MultiThreaded Engineering, UX and Design at Stitch Fix

At Stitch Fix, our goal is to help our customers look great and feel great about themselves by revolutionizing how people shop. In a time-starved world where shopping often feels overwhelming, our business connects customers to clothes they love. Whether it’s helping someone dress for success at a new job or taking the stress out of packing for a family vacation, we fix clients’ closets – and they love us for it!

We’ve built unique, innovative software for merchandising, warehouse and inventory management, remote styling, and logistics. We leverage vast amounts of client data to make decisions throughout the company. All of this results in a simple, powerful offering to our clients and a very successful business. We believe we are only scratching the surface of our opportunity, and we’re looking for incredible people to contribute!

Director of Engineering

About the Team

Stitch Fix is an online personal styling startup in San Francisco, combining art and science to disrupt and redefine the retail industry. Our technology team builds the applications that run our business and create an exceptional and personalized experience for our clients, from our customer-facing website and mobile app to unique and innovative tools that power our warehouse, merchandising, and styling teams. We leverage vast amounts of client data to make decisions throughout the company. All of this results in a simple, powerful offering to our customers and a successful, growing business. We strive to ensure that our overall service offering is the best that it can possibly be, and we pride ourselves on creating a fun and collaborative environment where we solve problems together. We believe we are only scratching the surface of our opportunity, and we’re looking for smart, kind, goal-oriented technology leaders who are relentlessly client-focused to join our team.

About the Role

Director of Engineering - Returning Client Team

You’ll lead the engineering team responsible for all aspects of the client experience from the moment a client schedules their first fix. Your teams will work on stitchfix.com and our iOS app on projects that increase retention, capture greater share of wallet, and make our clients happier.

You’ll:

  • Own your organization design and hiring plan that will best tackle our current and future opportunities
  • Partner with marketing, product managers, UX, and data scientists to define strategic initiatives and product vision, drive your roadmap, and help millions of people dress better and feel better about themselves.
  • Ensure that smart investments are being made to enable long-term effectiveness of our products, people, and processes
  • Help the people on your team grow into better leaders and engineers.
  • Enable your organization to scale by setting direction and goals then empowering your team to make decisions.
  • Influence company-wide technology decisions as a member of our engineering leadership team.

About You

You are an engineering leader with experience:

  • Leading hiqh quality engineering teams of more than 10 people that have more than doubled in size.
  • Finding and hiring great people.
  • Building trust and influencing a diverse group of business partners and engineers.
  • Leading teams that built consumer products and drove meaningful changes in customer behavior.

If you also have, great! (but it’s not a requirement:)

  • Hands-On Ruby on Rails Experience
  • Experience leading iOS or Android teams
  • Experience leading growth engineering teams

We are excited about you because:

  • You’re Kind: We approach our coworkers with humility, respect, and trust. You are someone who prefers to do the same.
  • You Build Teams: You know great software is impossible without great people. You know how to hire and develop great people.
  • You’re Pragmatic: You're not religious about any particular tools or methodology. You've worked with many, and you probably have strong opinions, but you adapt to what works best for each team.
  • You’re Data Driven: You don’t like making decisions based on opinions. Instead, you focus decision making on what we know and what we can learn.
  • You’re Inspiring: Your team understands why their work matters and is excited about it.

About the Technology

At Stitch Fix, our engineers use technology to solve business problems. In this role, the tech stack your team will use includes:

 

  • Ruby on Rails
  • Swift
  • Postgres
  • Redis
  • Memcache
  • RabbitMQ
  • AWS
  • Heroku

 

We also practice continuous deployment. Teams deploy multiple changes to production in a typical workday.

 

About the Company

Stitch Fix is an online personal styling startup in San Francisco, combining art and science to disrupt and redefine the retail industry. Our engineering team builds the tools to run the business, from customer-facing website and mobile app to unique and innovative tools that power our warehouse, merchandising, and styling teams. We leverage vast amounts of client data to make decisions throughout the company. All of this results in a simple, powerful offering to our customers and a successful, growing business. We believe we are only scratching the surface of our opportunity, and we’re looking for incredible people to contribute!

Why you'll love working here…

  • We hire bright, kind, and goal oriented people. We love what we do at Stitch Fix but know that it is not the only thing in our lives. Our culture and benefits are designed to support our employees so that they can be their best selves both at work and away from work.
  • We offer competitive salaries, equity and comprehensive health benefits
  • We are a startup with a proven business model that’s growing fast
  • We are a technologically and data-driven business
  • We offer great time away programs
  • We are blending tech and fashion, redefining shopping for the next generation
  • We are passionate about our clients and live/breathe the client experience
  • You’ll get to be creative on a daily basis
  • We believe in autonomy & taking initiative
  • You’ll report to a leadership team that wants to do it right and values innovation
  • We have a fully stocked pantry with your choice of snacks and drinks
  • You’ll work in light-filled, modern offices in downtown San Francisco, CA

 

 

 

 

IT Cloud Infrastructure Administrator
San Francisco, CA

About the Team

The Stitch Fix IT team is committed to creating a fulfilling and inspiring place to work for all of the people who deliver that client experience – from our warehouse team to tech talent to merchants and stylists. We pride ourselves on creating a fun and collaborative environment where we solve problems together. We are looking for smart, kind and goal oriented technical leaders to join our team.

About the Role

As the Cloud Infrastructure Administrator, you will help transform the growing IT team at Stitch Fix, while building your own skills and creating an amazing experience for your colleagues. You will support the day-to-day network and cloud systems operations, build documentation, and emphasize superior support models. You will support our downtown San Francisco headquarters and our various remote offices, distribution centers, engineers, and stylists. Stitch Fix is primarily a OneLogin, G Suite, JAMF Pro, and Cisco Meraki environment.

You’re excited about this opportunity because you will…

  • Be prepared to manage the configuration and collaborate on issues issues involving our network environment from the Cisco Meraki Dashboard with minimal supervision.
  • Assist in administering our cloud MDM and security environments, including: JAMF Cloud (Casper) for macOS and iOS management, Microsoft Intune for Windows management, Sophos Cloud anti-virus/anti-malware, encryption enforcement, and workstation monitoring systems.
  • Have experience with cloud-based Single Sign-On systems and engage hands on supporting our OneLogin SSO environment with Duo multi-factor authentication.
  • Manage AWS-based VPN services for site-to-site and endpoint secure access.
  • Keep AV systems – LifeSize, Blue Jeans, and Zoom – functioning well and assist in running larger all-hands style meetings.
  • Assist in administering our Google G Suite applications, Office 365 licensing, and Adobe Creative Cloud accounts.
  • Assist with managing Box, Box Sync, and document storage.
  • Establish and validate endpoint backups with Code42-hosted CrashPlan PROe.
  • Coordinate with remote-hands IT support at office and warehouse locations outside the Bay area.
  • Help keep our IT tickets fresh and alive using Zendesk, other experience welcome.
  • Assist with maintaining accurate inventory, asset tracking, and change management.
  • Show a positive outlook (not Outlook) and humor by making IT nothing less than an immersive, atypical experience for all of our co-workers.
  • Mentor and assist co-workers and vendors for continuity of policies and procedures instituted by our team.

We’re excited about you because…

  • Your experience includes at least 1 year of administration of local and remote networks, including wireless, switches, and security appliances, as a key part of your IT roll. Time spent managing Cisco Meraki networks is a major plus.
  • You have a minimum of 2-3 years hands-on experience with JAMF Pro (Casper) and can explain how to wipe a Mac, update enrollment, and verify successful recon and policy implementation. iOS and Windows management experience is also helpful.
  • You are familiar with OneLogin or equivalent SSO and have managed SSO implementations for at least 1 year.
  • You have 2-3 years of experience supporting and deploying macOS systems.
  • You understand and troubleshoot the essentials of wired and wireless connectivity.
  • Your knowledge and personal interactions make you approachable by all levels of co-workers.
  • You have experience with LifeSize and Blue Jeans or other equivalent video conferencing systems and be willing to learn and be the primary contact for our HQ systems.
  • You communicate clearly and effectively when writing ticket updates to onsite and remote users.
  • You thrive in high-volume, fast-paced, and feedback-driven environments.
  • You’re highly self-motivated, have a stellar work ethic and you’re looking for the right company to support your growth.
  • You are able to remotely travel between our three local sites – two in San Francisco and one in South San Francisco.
  • You’re available for rotating on-call support for network related issues.
  • Above all else you are bright, kind, and goal-oriented.

You'll love working at Stitch Fix because we...

  • Are a successful, fast-growing company with a start-up work vibe.
  • Are a technologically and data-driven business.
  • Offer competitive salaries, equity and comprehensive health benefits.
  • Are at the forefront of tech and fashion, redefining shopping for the next generation.
  • Are passionate about our clients and live/breathe the client experience.
  • Get to be creative on a daily basis.
  • Are a smart and experienced leadership team that wants to do it right & is open to new ideas.
  • Believe in autonomy & taking initiative.
Verified by
Software Engineer
Software Engineer
You may also like