The Evolution of The New York Times Tech Stack

This is the fourth episode of Stack Stories, sponsored by STRV. Hosted by Yonas, Founder & CEO, StackShare and featuring special guests Nick Rockwell, CTO of The New York Times and James Cunningham, Ops Lead at Sentry. Follow us on SoundCloud or subscribe via iTunes.

Nick Rockwell, CTO of The New York Times


The New York Times is one of the largest publications in the world with 150 million monthly uniques on their own site and 2-3x that number on third-party platforms like Facebook. While perhaps not often thought of as a “tech company”, the Times deals with challenges in scale and traffic many startups can only dream of.

For this episode, we sat down with the Nick Rockwell, Chief Technology Officer (CTO), at The New York Times, and special guest James Cunningham, Operations Lead (and "Google Cloud Expert") at Sentry, to discuss how technology has evolved at NYT. In the few years he’s been there, Nick has brought the paper from managing their own data centers and using a LAMP stack, to the “modern age” - using React and GraphQL and migrating to Google Cloud.

Listen to the full interview or read the lightly edited transcript below:







Contents




Highlights

Moving Into the Digital Age

We're at a reflection point, where we can see how the Times works as a fully digital business. We can see a path to print disappearing and us being able to sustain the newsroom and keep doing what we do.

I think the whole organization has accepted that we're digital first, at this point.

I think it's about pulling the rest of company, and the infrastructure, and our practices along, so that it all feels more modern and we can have a good, sort of solid place to really focus on doing the product well, which is what matters.

The Fake News Problem

There's been crazy stuff going on in the newspaper world forever. That said, everything is amplified by 100,000 times in the connected world that were in now. So, the impact can be greater, or so it seems. I think that's actually debatable. But, nonetheless, I think it is a problem, it's not something to be ignored. I mean, the way these things are kept in balance is by, occasionally, society becoming outraged about how degenerate things have become and pushing back.

We solve it by not doing fake news, by actually checking sources and being very aggressive.

Product Development at NYT

We've made a decision that our destination and our own products are the best places that we can showcase our journalism. And the place where we can provide the most value for our users and also sustain our business.

Our reach is enormous, we reach 150 million monthly uniques on our own platforms. Probably double or triple that, if you include the platform footprint, the social footprint.

What we want to do is showcase the journalism and surface the right journalism for each person. Also, let me be very careful to say that we believe in our mission as a curator, as well.

Started with a LAMP Stack

Overall the stack was a pretty typical LAMP stack. It was mostly PHP on the back end, MySQL, with some Java services mixed in.

When you asked for a description of a particular system's architecture, you would get a story.

I've been thinking a lot about the idea that architecture should say something. You know, it should be opinionated and it should say something about how you think about development and what you think is important, or not.

Organizational Structure

Every team should staff the skills that they feel they need to. And nobody has a monopoly on front end or back end work.

In practice, the product teams are more front end focused and the platform teams are more back end focused.

We mostly don't share code bases... If something's part of the platform, it's a service that's got an API.

Making Decisions on New Tech

Before we started making a whole lot of them, I wanted to think a little bit about how we were making them. And, we ended up setting up a thing, which every time, I literally can't say the words without feeling slightly nauseous, but it's the Architecture Review Board.

So we thought a lot about what we wanted that to be and to not have it suck. And, I don't know if we've gotten it perfect, but what we said was basically, "Okay, this is a group of developers who are drawn from all of the teams," and they're still working on the teams. This isn't all they do, it's a sort of 30 to 40% time commitment.

What they do, is, first of all, they consult with any team that's trying to make a decision. And they try to look across everything. So, they try to know what's happening. Then they consult. And then we have an RFC process. So, if you're going to do something significant, you're expected to write up an RFC describing what you're going to do.

Moving Away from LAMP

Basically, I came in and there was already broad dissatisfaction with that LAMP Stack and the front end framework, in particular. So, I wasn't passing judgment on it.

We're moving quickly away from it, I would say. So, right now, the new front end is React based and using Apollo. And we've been in a long, protracted, gradual rollout of the core experiences.

React is now talking to GraphQL as a primary API. There's a Node back end, to the front end, which is mainly for server-side rendering, as well.

Behind there, the main repository for the GraphQL server is a big table repository, that we call Bodega because it's a convenience store. And that reads off of a Kafka pipeline.

Data and Analytics at The New York Times

We really drank the Google Kool-Aid on analytics. So, everything's going into BigQuery and almost everything is going straight into Pub/Sub and then doing some processing in Dataflow before ending up in BigQuery.

React, GraphQL, and the Front End Evolution

I was a fan of React. And we had adopted it at Conde Nast prior. I didn't push it at all, but we ended up in the same place.

Out of the 3.0 front end frameworks, React was the one that people were excited about.

The choice of GraphQL wasn't super controversial. What was more controversial, was how we were gonna implement and then who was going to own it.

Serverless and the Future

To me, what it means, is any context where as a developer, you don't need to think about scaling and architecting for scaling.

You don't have to think about availability zones or having redundancy. That's all managed for you.

I would say, if there's a good SaaS solution out there, use that. If there is a good managed service out there, which is almost the same thing, use that. If there isn't, but you can host your app on a serverless platform, that meets those criteria, use that. And then if not, do something instance based or container based.

Shifting to Google Cloud

By default, the content of the decision was, for our consumer-facing products, we're going to use GCP first. And if there's some reason why we don't think that's going to work out great, then we'll happily use AWS.

In many cases Google is going to prove to be a better engineering company than Amazon.

We loved BigQuery, so we said, "Let's go all in on the data stuff." And that's worked out fantastically.

Open Source and the Developer Community

I don't feel like we have a developer community around our APIs at this point. I'd like us to, if it's something that world wants.

I'm theoretically and philosophically willing to open source just about anything. I don't view anything that we do as super proprietary.

That said, I don't want us to just open source junk, or everything, or things we're not going to maintain.




Yonas Beshawred: Welcome everyone. In the studio, we have Nick Rockwell, CTO of The New York Times, Kane Ren, Head of Engineering at StackShare, and James Cunningham, Ops Lead at Sentry… Why don't we just go around and introduce ourselves, real quick.

Nick Rockwell: Sure, I'm Nick Rockwell. I'm CTO of The New York Times. Spent most of my career in sort of digital media in traditional companies and a couple of startups. I was at Conde Nast before the Times, and then TV networks for a long time. And an early internet startup called Sonic Net in the mid 90s. Happy to be here.

James Cunningham: I am James Cunningham. I work at Sentry. I lead the operations team, making sure everything flows smoothly.

Kane Ren: I'm Kane. I'm heading up engineering at StackShare, for us. I met Yonas last summer and joined in October. Previously, I spent some time at Yammer, during the hyper-growth days, prior to and after acquisition. And also spent some time at Yelp, leading our API engineering team. And also a bit of time at Salesforce, working on a very small project called do.com.

YB: Alright, we've got an awesome group today. And first, why don't we start, Nick, with just some background and how you ended up at New York Times.

NR: I like to think that the Times is the last job in media that I would have and will have, I think. Having spent a lot of my career at MTV, and then when Jersey Shore came along, feeling like, "What am I doing with my life?" And then with a couple of other detours, ending up at Conde Nast, which was really fun for a while, but eventually felt like I was selling luxury handbags for a living. I had an opportunity to come to the Times and felt like it was important work, relevant and, you know, brand and a product I've always cared about since I was a kid growing up in New York. So, I thought it was super exciting and I've been happy to be there, in a fairly crazy couple of years.


Moving Into the Digital Age

YB: So, maybe you could just talk a little bit about how you think about technology at New York Times. You know, obviously, there's a lot of talk about print going away, digital is the future. Your CEO just said, actually just the other day, that print has approximately ten more years. And obviously, he's not just talking about the New York Times, but he's talking about the broader industry. How do you guys look at digital and how do you sort of set that vision?

NR: So, for starters, it's important to realize that we're now at least 20 years, if not more, into our digital journey. We have six layers of digital legacy at this point. We number our New York Times frameworks, and we're onto six now. There are still bits of three lying around. I think one and two are pretty much gone. So, it's been a long, long journey already. And, I'm someone who's spent my whole career in digital media, so it's not like a new thing anymore.

But that said, I think the difference is we're at a reflection point, where we can see how the Times works as a fully digital business. We can see a path to print disappearing and us being able to sustain the newsroom and keep doing what we do. That hasn't been true until very recently, I think.

You know, Mark said what he said. We have no idea how long print will last. And the truth is, as long as people want it and as long as there's enough scale that we can operate our printing facility profitably, we'll keep doing it. I personally think it will probably continue indefinitely, in some form or another. It may be very different. And we're actually working all the time to figure out what's special about print, and what we can do in print that feels impactful, and meaningful, and new, and differentiated from digital. It's not just about getting the news to people. Because it's obviously a much less effective way to do that. But, we do things like shipping the Google Cardboard out to everyone, as part of PR strategy, and being able to put that in blue bags and put in on two million people's doorsteps, is something special. We do things, like we did a kids' supplement in the last Sunday Times, that I think was kind of fun, special for kids. Just a different experience and we got to teach them about the newspaper.

YB: That's awesome.

NR: So, we just try to be flexible and we'll see. At this point, our infrastructure and our core workload for creating the news is completely integrated. So, it's not a source of friction anymore, it's just kind of another output.

YB: Gotcha. So, do you think that the way you all think about technology is different maybe from some of the other publications, or from the rest of the industry? How do you sort of set that vision and how do you get everyone rallied behind this idea that, "Look, the digital side of things is really important?" Or is that just naturally happening throughout the organization?

I think the whole organization has accepted that we're digital first, at this point.

NR: Yeah, we're there, we're there. I think we're still learning, somewhat frustratingly, I think we're still learning particularly how to do product development well. But, I think the whole organization has accepted that we're digital first, at this point. And that's quite helpful.

How do we think about technology? I'd say we think about technology and the Washington Post talks about it a lot.

But what I mean by that is I'm a real believer in really yoking the technology strategy into the product strategy, to the company strategy. And not doing things for their own sake, or doing more than we need to do. A lot of the problems in media are actually very simple problems. One of the first things I did, when I came into the Times, was implement Fastly as a proper CDN across everything. And I was shocked and appalled that it hadn't been done, cause I did it 20 years ago, for the first time, when I was at MTV. But that takes away so many of the headaches of a media company. So much is just about kind of one-way distribution, and caching is such an effective tool.

So, simple things, simple solutions to our problems, are something that we try to remain focused on and not make things harder than they need to be.

YB: Yeah, and in some ways, that was an interesting choice for New York Times to bring you in because, I mean, I don't have the stats on this, but I think what usually happens is, you don't have hardcore technologists that are put into some of the roles. And so, some of these decisions, like CDN, day one, are natural to technologists but then when you bring it into an org that hasn't typically thought that way, it's a new idea. Right?

NR: I think that the Times picked me for one set of reasons, and then they got something a little different than they expected.

YB: Okay, not as advertised.

NR: I think to some degree, they saw my time at Conde Nast and thought, "Oh, here's someone who understands the publishing world." And you know, I did have that experience but it's not where I feel like I came from. I mean, I really trace my formative years back to the startup years, in the 90s, and the very beginning of the web, when I had to do things like write my own ad server and my own application server. And like, crazy things that nobody would ever do anymore. And I don't necessarily even think of myself as a hardcore technologist, certainly not by West Coast standards. Nonetheless, I do what I care about, and my passion is actually technology and architecture and solving problems - not so much whether I'm in the publishing industry or something else.

So, I hope they think they're lucky because they picked me for one reason and they got something else. And, it's been fun for me to go in and, frankly, just do a lot of clean up. Like I said, we have layers and layers. Like one of the biggest technical problems is the legacy. And I talked about the digital legacy now, which is quite old and multi-layered. I didn't even talk about the print technology legacy, which is truly frightening. I mean, we shut a mainframe down at the end of last year and stuff like that.

YB: Wow.

NR: So, I happen to like to clean things up, so it's been fun and satisfying. But it probably has been a little more change than folks were prepared for.

YB: Gotcha. Well, so, would you define a lot of your role as kind of bringing New York Times into this era? For, you know, as far as computing, as far as tooling? Is that a large part of what you're doing?

NR: I think so, yeah. And I'd put it slightly differently. I would say, to me, it's about bringing discipline to the way we do technology and product development. I think it's all the stuff is sort of easy to start doing, and hard to do well and to follow through on, and to do the hard parts of. So, I'd say, like, most of these older companies, and frankly a lot of newer ones at the times ... have one foot in the future and three feet in the past. I guess it's a four-legged creature.

I think it's about pulling the rest of company, and the infrastructure, and our practices along, so that it all feels more modern and we can have a good, sort of solid place to really focus on doing the product well, which is what matters.

YB: So, would you say that ... I mean, in some ways, the New York Times is now a technology company, right? And will be, you know, in ten years, we'll be saying that it's another technology company. Would you? Is that how you see New York Times now?

NR: I don't. I mean, that's a gesture that some of us like to make, in a way that we like to talk. But I don't really see it that way. And the Times very clearly will always be about journalism. Right, I mean, that's where the value comes from. That's what we do. That said, every company is a technology company or has a technology company tucked within it. And even that's not really giving enough due to the complexity. Like, we're, "What's a technology company?" Well we're a software development company, we're also like a manufacturing and distribution company, cause we still do roll trucks and...

YB: Logistics.

NR: Logistics, and we're a machine learning company, like we need to be applying machine learning. So, it's kind of a never-ending list of new kinds of company that we need to become, in part, at least. But nonetheless, the core will always be to journalism, which I think is a fundamentally creative undertaking that can be directly supported in some important ways by technology, but ultimately it comes down to good storytelling, good investigative journalism, and that's human work.

YB: Yeah, absolutely. And I mean, personally, New York Times is one of the only publications I read regularly.

NR: Are you a subscriber?

YB: I am subscriber, yes.

NR: Thank you.

YB: Yes, I am. And that's recent. Not just because of you.

NR: Very good.


The Fake News Problem

YB: I mean, it's because now quality is part of what everyone's seeking. Because there's so much information out there. It's all about, "Hey, how do we get to the quality?"

Which brings me to the next point here, how do you view this whole argument about fake news? I know you've suggested previously something like HTTPV, right? But, what are your thoughts on this concept of fake news and how it impacts your day-to-day as well?

NR: It does and it doesn't. It's a problem that we're concerned about, as obviously a participant in the news world and ecosystem. It's also an interesting story, that we cover as a newspaper - the impact and the effect on politics, and so on. At the same time, I think, it's complicated and it's a little overstated. It's overstated as a novelty. I mean, there's always been fake news. There was a whole piece published in 1905, or so, in the Sun, in New York, about like, "Scientists have figured out how to get to the moon, and they discovered life." And they let it run for two weeks before issuing a retraction.

So there's been crazy stuff going on in the newspaper world forever. That said, everything is amplified by 100,000 times in the connected world that were in now. So, the impact can be greater, or so it seems. I think that's actually debatable. But, nonetheless, I think it is a problem, it's not something to be ignored. I mean, the way these things are kept in balance is by, occasionally, society becoming outraged about how degenerate things have become and pushing back. I don't think it's the Times' problem to solve. We solve it by not doing fake news, by actually checking sources and being very aggressive.

By the way, I can say, you know, from coming into the Times two years ago, that we really do take our mission to be as objective as possible, seriously. Now that's opinion aside. But for the core news reporting, we really do try to be objective.

But as far as solving the fake news problem across the internet, that's not a problem that we can solve. I think it's an interesting problem for the platforms to engage on. And I think ideas, actually about building some kind of sense of verification into the fabric of the internet, almost at the protocol-ish level is an interesting idea. That would be super hard to execute, but not a bad way to be thinking about it.

YB: Yeah, because then you get into the whole, well, who is approving the new entrants into this.

NR: So the more decentralized and kind of emergent it can be, you almost think of like, page rank. And you know that was accomplishing, in some ways, the same goal. But like, what's the sort of consensus of the web as to what sources can be trusted and what can't be. Obviously, that's tricky, but I think that's not a bad approach or a way to think about, anyway.


Product Development at NYT

YB: Yeah, and this sort of relates back to product development. Because now, I think, from a consumer's perspective, I think what I see, is that publishers are starting to build more of these product aspects into their own platforms, right? Can you talk a little bit about how you're looking at product development… how you're looking at building things into your apps, that sort of remove the need for you to go to a Facebook or go to an external source to see all the news that you need to see?

Our reach is enormous, we reach 150 million monthly uniques on our own platforms. Probably double or triple that, if you include the platform footprint, the social footprint, let's say.

NR: Sure. So, we first of all, greatly value our platform partners. You know, we view Facebook, Google, Apple, all of them, as critical partners and we couldn't have the business that we have without them. They are a source of acquisition, both as a paid channel and obviously a great way to get our content out there, get our brand out there, etc.

So, at the same time, we've made a decision that our destination and our own products are the best places that we can showcase our journalism. And the place where we can provide the most value for our users and also sustain our business. So we're committed to the Times as a destination. And again, not for everybody. Our reach is enormous, we reach 150 million monthly uniques on our own platforms. Probably double or triple that, if you include the platform footprint, the social footprint, let's say.

But, you know, when we talk about our subscriber base, that's a much, much smaller number, and those are the people that we really want to offer the best experience to. So, given that, what we want to do is showcase the journalism and surface the right journalism for each person. Also, let me be very careful to say that we believe in our mission as a curator, as well. So, we care ... the home page of the Times is a thing, and people value us saying, "Here's the news that you need to know."

At the same time, we publish 250 to 300 pieces of content a day. Nobody can read it all. And supplementing the curated view with something that's a little more about like, "Here's the stuff we think you're interested in, or that you might enjoy or get a kick out of." Driving discovery, as well as things you might be interested in, things that might be more serendipitous. We think that's a core mission for the product, as well. That's kind of what we're focused on right now.

YB: And in some ways that's improving upon this idea of a newspaper, right? It's a balance. I forget the quote, but someone was saying that newspapers were great, because when everyone was reading the newspaper, everyone knew all the news.

NR: Yeah, and you come to build sort of a consensus. A shared reality.

YB: A shared reality, exactly.

NR: And society, I think is a really important part of our mission.

YB: Exactly, cause now, everything is personalized. So how do you strike a balance between personalization and then just, core news, that you all feel like is important for everyone to know?

NR: The answer is, we're committed to curating the news. We're committed to having a point of view and telling people what we think they need to know. And then we think we can supplement that, with a more playful approach to also serving other content to people.

And I like to think, in some terms, recreating some of the serendipity that was evident in the physical product. And I think, I have a sort of cheesy image in my of mind of picking up the paper and coming in and tossing it on the table. And the sections spill out and you're like, "Oh, what's that article?" You know? And you might never have gone to the style section to read about this reality show or something. But then, you're like, "Oh, my god, this is insane." You know, and you enjoy it.

So, creating some of that spontaneity, almost randomness, you know, I'm a big user of the Discover Weekly on Spotify, and I've talked to some of the people behind that. And their framing was never just to get people more music like what they listen to. It's been to understand people's taste and then give them new music. So, it always was about discovery, and we take that approach as well. Like we want to think about how to open doors for people, expose them to things they might not have seen, that we think they'll get a kick out of.


The New York Times Homepage

YB: So, from a product perspective, do you have a mandate for your homepage, that says 95% has to be curated, and then we'll give 5% to discovery? Do you have any sort of mandate like that?

NR: It's not that rigid. And I think the homepage belongs to the newsroom. They're experimenting with different ways to take pieces of it and make it a little more personal. But more, I think in terms of how do we supplement that experience. And it could be through totally different means, it could be through email, it could be through a second feed, there are lots of...

YB: Push notifications?

NR: Push notifications, as well. We haven't done a lot of targeting of push notifications, but that's something that we will be focusing more on as well. So it's a balance, but I think those things absolutely can live side by side. Different parts of the experience.


Started with a LAMP Stack

YB: Awesome. Alright, onto the tech stack. So, you joined New York Times, when?

NR: A little over two years ago. November 2015.

YB: Okay, so when you entered about two years ago, what did the tech stack look like? We can start with the back end and work our way up.

NR: Yeah, so overall the stack was a pretty typical LAMP stack. It was mostly PHP on the back end, MySQL, with some Java services mixed in. And then a little bit of everything. We've sorta been in a period where, with the best intentions, we sort of tried to let a million flowers bloom and let every team sort of do what they wanted. So, there was a bunch of Go. There was one team using Scala for a key project. And then a whole bunch of stuff related to our analytics stack. We had Hadoop in there, and we were using a little bit of Redshift and you know, a lot of the Amazon infrastructure. There was a whole data pipeline that involved Dynamo and SQS and other things. We had a little bit of everything going on.

Infrastructure wise, we had four data centers of our own, an AWS virtual data center, and a bunch of other stuff just scattered around throughout AWS. So, we were kind of all over the place, as well.

YB: Gotcha, okay. So, when you first came on board, was it easy for you to just understand all of that? Or, how were you getting ramped up?

When you asked for a description of a particular system's architecture, you would get a story.

NR: No, it was super difficult. And one of the telling things was, when you asked for a description of a particular system's architecture, you would get a story. You know?

YB: New York Times?

NR: You'd get a long story. It's like, "Well, you know, when we started ... and then we ..." So that I thought was quite telling. But, even more, I would say, it was difficult to get ... even just to understand everything. But then if you stepped back and looked at the big picture, it was not comprehensible. It didn't say anything and I've been thinking a lot about the idea that architecture should say something. You know, it should be opinionated and it should say something about how you think about development and what you think is important, or not.

As I was trying to onboard and understand things, I was thinking about the experience of a developer coming in and having that same feeling of like, "I don't understand why anything is the way it is." So I have no guardrails, or no guidance as to how I should make decisions and think about things because there's no opinion here, basically. So, one of my goals, was that over time, the architecture would become more philosophical. Like it would say something, more intelligible. And that's one of the things that we've kept in mind, as a guidepost, as we've gone about things.


Organizational Structure

KR: Yeah, gotcha. How many different teams are there involved in the technology side?

NR: So, we talked about the product engineering versus the corporate IT stuff. And we're thinking in terms of development teams, depending ... you can slice it slightly different ways. Mostly we're organized by product and platform, and the product is split into, not that many, but a handful of products. Basically, the core site, the native apps, the cooking, crosswords and so on.

KR: That's right.

NR: But then that core team is pretty big, so that splits up into four or five different teams. And on the platforms side, we're divided up into teams with different areas of responsibility. So all together, it's probably about 20, 22.

YB: 22 orgs?

NR: 20 teams. Teams of like, kind of six to ten to 12 people. Maybe a little more across disciplines, including product and designs. Our teams are a little on the small side. Actually we have quite a few that are two or three developers, which is something that we're kind of stretching, as deciding whether that's a good thing or a bad thing.

YB: So, I'm bad at math. Is that 200 to 300 people across product and platform?

NR: Probably 200-ish. If we start with engineers it's probably a bit under 200, if we exclude product design, product management, QA and so on. It's like 250 to 270 probably.

YB: Okay, and that includes DevOps?

NR: Yeah, yeah. DevOps too.

YB: Gotcha, okay. Awesome. That was an interesting ... it's always an interesting question because it gets to the heart of how you also think about building things, right?

NR: Yeah.

YB: So, product and engineering makes sense. Do you also ... are the teams cross-functional? You don't have like, back end, front end?

Every team should staff the skills that they feel they need to. And nobody has a monopoly on front end or back end work.

NR: Yeah, so this is an interesting debate that we continue to have. So, I have a firm "yes" to that answer, if we define cross-functional as product, design, engineering, data, testing. It's those kinds of functions. Within engineering, we talk about front end, back end, and so on. My answer to the team, is absolutely. Every team should staff the skills that they feel they need to. And nobody has a monopoly on front end or back end work.

In practice, the product teams are more front end focused and the platform teams are more back-end focused. And some of that is natural. But some of that is also inertia and some of that is sort of misreading… continuing a collective misunderstanding of what we're doing. So, we still sort of interpret product and platform as front end and back end, but that's not the way that it should be.

There are things we do, that really are platform, that have a UI. A good example would be customer care, which is going to be surfaced in a whole bunch of different products and we really want it to be consistent. And at the same time, there are things we do in the product that require a back end, but what they need from the back end doesn't really rise to the level of platform. It's not particularly reusable. It would be a main test. So, there's no reason why they shouldn't have back-end and just build their own back end. So, I think we're still evolving our way through that a little bit of trying to get to the right place.

JC: And when you're dipping different teams into the same functional code base, right. Into the same back end, how do those teams maintain their standards in those shared code bases? If you have someone on the product team dipping into what some might believe is platform land, how do they go about that? How do they request a change?

NR: So, we mostly don't do that. Meaning, we mostly don't share code bases. So, I think what we're ... we try to be discriminating about what we call platform. And at the end of the day, if something's part of the platform, it's a service that's got an API. And the contract then is really ... Okay, there's a team that got a roadmap for this service. And if you're on the front end, they need to do their own, first of all, market discovery and understand what their client teams need. And if you're a client team, yeah, you've got to tell them what you want. And then they manage that through the roadmap.

Our core primary API for content is a GraphQL server implementation now.

Now there are a few places where that's not been fast enough. So, a good example would be our core primary API for content is a GraphQL server implementation now. And that's been a place where the rate of change to the schema makes it awkward and painful at times, so that'll be funneled through one team. There are some other reasons why it's a little sticky for us too.

So that's a place where we've tried to open up so that anyone can work on parts of that code base. Specifically, the resolvers and GraphQL. People write their own and effectively extend the schema. We're still trying to figure out the right way to make that manageable. And I think we're getting there. It feels okay now. But it opens up a whole bunch of cans of worms. Like, who do we hold accountable for the reliability of those things? And then we're also asking teams to re-skill a fair amount. For historical reasons our GraphQL implementation is in Scala, using the Sangria framework, which is not the technology of choice for most front end developers, I'd say. "Why can't we just use Node?" And the answer is, "Well, it's a long story."

YB: Another story for you.

NR: Another one of those stories. So, that's where we are today.

YB: Yeah, I mean, I was about ask, front end is currently, is it just homegrown?

NR: Well, it was like a homegrown PHP framework. And now, it's React based, heavy on the Apollo client.


Making Decisions on New Tech

KR: Okay, I was gonna say, well what happens if someone wants to ... if they're making a case that everything should go to Vue.js. What team would roll out that platform? Because it's cross-product?

NR: So, that's a good question too. And the answer is, we've ended up in the place where we don't really think ... that's not what we mean by platform in the first case. What we really mean is like, services that multiple development teams, products really, are gonna rely on. That's our primary definition.

One of the first things that I did when I came in, was to think about how we made technology decisions.

When we get to frameworks that we want to use consistently across things, there's a platform team in the core news team, who kinda makes those decisions. And they did the work to do the due diligence summary act, and say, "Hey, we should all go in this direction." It's kinda like where the core product goes, everyone else follows, because they're the ones who really have the resources to investigate things well. But, the most important part of that process is that ... and this is one of the first things that I did when I came in, was to think about how we made technology decisions.

Before we started making a whole lot of them, I wanted to think a little bit about how we were making them. And, we ended up setting up a thing, which every time, I literally can't say the words without feeling slightly nauseous, but it's the Architecture Review Board.

So we thought a lot about what we wanted that to be and to not have it suck. And, I don't know if we've gotten it perfect, but what we said was basically, "Okay, this is a group of developers who are drawn from all of the teams," and they're still working on the teams. This isn't all they do, it's a sort of 30 to 40% time commitment.

What they do, is, first of all, they consult with any team that's trying to make a decision. And they try to look across everything. So, they try to know what's happening. Then they consult. And then we have an RFC process. So, if you're going to do something significant, you're expected to write up an RFC describing what you're going to do.

Hopefully, that's already been after some consultation, but then you write it down and you share it. And actually, the entire engineering organization can comment on it. But the ARB is a bunch of smart people who have promised to comment it on it. They'll have to read it and they have to comment on it. They can bring other people in explicitly if they think it'll bring a good perspective. Or anyone can just jump in.

But then we have an open period of a couple of weeks, where comments are open. And then they close it. And out of that they write a response. And that response should include some prescriptive statements, like, "Do this." And then it may include many suggestions. But we're very clear about what's prescriptive and what's suggested. And are very clear that if they prescribe something, you really should do it. You can still appeal it, but really you should do it.

YB: Oh, there's an appeals process?

NR: There's always an appeals process.

YB: Okay.

NR: It's bullshit to pretend that there isn't. You know, like, people will come to me and...

YB: I was about to say, does that involve having coffee with Nick?

NR: It does, and I'm pretty good at just saying like, "I dunno, why do you think I know better than these guys? Go talk to them." So that almost never happens. But it's been tough actually, the toughest part has been getting the ARB team to really make prescriptive statements because everyone's nice and inclusive, and we all wanna be flexible and not have it become super bureaucratic. But once in a while, they have to put their foot down, so we can have coherent kind of architecture and we don't waste a lot of time.

JC: So, obviously, there's probably been at least one of those, where the result has been some sort of disagreement.

NR: Oh, yeah. Big time.

JC: How do your teams go around generating support for ... The majority is saying, "This is the decision we've made. This is what we're going to stick with. And I need you to support me as the decision maker." Is there any sort of friction in that process? How do the naysayers become a, "Ehh, I agree."

When we've had real conflict, we've settled down to slog through it until we really got to a conclusion.

NR: I mean, I think there's no shortcut to getting there. Usually when there's a shortcut that's a sign that something's wrong. It'll either mean a team feels they can just ignore feedback from the rest of the team and go do what they want and they won't get in trouble. Or, they'll try to have their cake and eat it too, in some funny way. So, I think, when we've had real conflict, we've settled down to slog through it until we really got to a conclusion. And that's required a lot of patience on the part of my senior management, who are getting a lot of pressure to move ahead. But if we're not ready to move ahead with the plan, we won't move ahead with it. We'll stay in that uncomfortable space where we disagree until we feel like we can really come out of it.

And that happened very significantly around our GraphQL implementation, actually. That was one of the hardest things we had to do. And it took all the summer basically, to arrive at a reasonable consensus. And still, not everyone is in complete agreement, to this day. But I think we found a way to make it work and move forward and go past it. So, I think, doing the hard work of plowing through a conflict and senior management, despite everyone's desire to forward, keeping us there until we've gotten to the bottom of the hard part.


Moving Away from LAMP

YB: Awesome. So, maybe we can circle back, briefly, to what the technology stack looked like when you got to New York Times. So, LAMP Stack. You had some stuff on AWS. Was that the bulk of it? Were there other providers in different places and was that sort of scattered as well?

NR: Well we had our data centers and whole lot running on VMWare virtualization and that was the bulk of our stuff and...

YB: Like 75%?

NR: Probably 60 to 70%. And then a bunch running in AWS, so that was the infrastructure. And then, I mean, there was a whole analytics stack, there was a whole MarTech and subscription management stack that was pretty separate from the core product. We also had the newer products, which at that time we called Beta. Which is the cooking and the crossword products, which were also basically completely different. Which I didn't mind too much.


NYT Interactive Mini Crossword

YB: Which weren't in the LAMP Stack, for those products?

NR: I'm trying to remember. I think, in the case of cooking, for example, it was, but it was a different LAMP Stack. Like a different set of PHP frameworks, etc., etc. Mongo had found its way into a place in the core publishing pipeline, some of the time, for some pathways but not others. And so, it was just pretty complicated.

YB: But the core site was PHP?

NR: Core site was PHP, yeah.

YB: Was there a framework being used on the front end?

NR: My understanding is, it was our own framework. And framework, I think, is a little bit of the wrong word for it. It was kind of a way of doing things and sort of a visual, almost more like a design framework than a real implementation framework.

But I didn't study it too closely, cause I was just like, "Okay, this is what we're not going to do anymore."

YB: Right, gotcha. And was that part of the impetus for creating ARB?

NR: Partly yeah. Basically, I came in and there was already broad dissatisfaction with that LAMP Stack and the front end framework, in particular. So, I wasn't passing judgment on it. I mean, LAMP's fine, you can do good work in LAMP. It's a little dated at this point, but it's not ... I didn't want to rip it out for its own sake, but everyone else was like, "We don't like this, it's really inflexible." And I remember from being outside the company when that was called MIT FIVE when it had launched. And been observing it from the outside, and I was like, you guys took so long to do that and you did it so carefully, and yet you're not happy with your decisions. Why is that? That was more the impetus. If we're going to do this again, how are we going to do it in a way that we're gonna get a better result?

YB: And so, can you talk a little bit about the major pieces of your text stack, today? Back end is it still largely ... I mean, of course, PHP, but are you slowly moving away from that?

NR: No, it's quite different. We're moving quickly away from it, I would say. So, right now, the new front end is React based and using Apollo. And we've been in a long, protracted, gradual rollout of the core experiences. But we're live on that stack for all of the core experiences, right now, at different levels of distribution.

YB: Okay, so fully React?

NR: Yeah, yeah. So, some of the old stuff is still serving pages. Particularly the home page, we've been very cautious. We changed the design when we changed the underlying implementation at the same time. So, we've been testing ends of the new design on the new stack and gradually increasing distribution. The story page is the other big experience and that, we're 100% on mobile on the new stack. On desktop we're very, very close, we might already be at 100%. But we're very, very close to but we will be in the next few weeks, I believe. So, we're pretty far along, over the next month-and-a-half, we should complete the migration for those core experiences.

YB: Right, but you're just talking about the front end, React?

The main repository for the GraphQL server is a big table repository, that we call Bodega because it's a convenience store.

NR: Yeah, so far. So React is now talking to GraphQL as a primary API. There's a Node back end, to the front end, which is mainly for server-side rendering, as well. So that's a piece. And then behind there, the main repository for the GraphQL server is a big table repository, that we call Bodega because it's a convenience store. And that reads off of a Kafka pipeline, which is where we are publishing our content to, and keeps a composed version of each piece of content handy. So we consider it a convenience store, rather than a canonical store because we're really using the inverted log paradigm. We'll read back from the log if we want to recreate a canonical store.

YB: So MySQL is still your persistent datastore?

NR: No, actually. Kafka is. Kafka is our system of record and then we have this big table based convenience store, which is like the production system you can go to when you don't want to recreate, you just want to get the state of something. Behind that, MySQL's still a back end to Scoop, which is our newsroom CMS. Our content management system. And so there's still ... It's creating a working store. And then when a piece of content is published it goes onto the Kafka pipeline, and off it goes.

YB: So creation still happens in the old system? CMS, MySQL backed?

NR: Yeah, though we've done a lot to evolve that system at the same time. That thing was all a big Struts/Hibernate beast. And there's still a little bit of that in there, but we've chipped away at a lot of it. Still some work to be done.

KR: And the migration paths were more of the back end storage site, do they follow a similar path as the front end? Did the core team also lead that effort? Or was this more...

NR: It was a different team. It was part of the platform team. And that involved, basically, once we wired things up to be dumping published content onto Kafka, we could gradually move clients over to read over directly from Kafka, or come through the GraphQL server to get stuff. And that process is mostly complete. So, mostly we're reading off of the new pipeline, not the old one.

KR: And this entire migration process, how long did it roughly take? Because we're basically talking about 150 million MAU and this process...

NR: We've taken our time. And a lot of it has been ... the forcing function has been our data center leases being up. The first one was the beginning of this year, where we essentially got out of there. The next are over the next few months, and then by end of April, or May, I think, we'll be out of the last one. And we'll be done. But that was the forcing function. Everything else, we could take our time. But old systems were either gonna have to be migrated or shut down. So, overall, I'd say, the whole process, we're about two years into. But the stuff I was just describing, we didn't really get started on, it's more like 15 months or so.

The first thing we did was our data and analytics pipeline. And that was a place for us to prove a lot of the basic ideas at, before we turned to everything else.


Data and Analytics at The New York Times

YB: Can you talk about data and analytics? In terms of how you collect it and process and analyze.

I want all of that to go away and do all our augmentation in BigQuery after the data's been collected.

NR: We really drank the Google Kool-Aid on analytics. So, everything's going into BigQueryand almost everything is going straight into Pub/Sub and then doing some processing in Dataflow before ending up in BigQuery. We still do too much processing and augmentation on the front end before it goes into Pub/Sub. And that's using some kind of stuff we pulled together using Dynamo and so on. And it's very brittle, actually. Actually, Dynamo throttling is one of our biggest headaches. So, I want all of that to go away and do all our augmentation in BigQuery after the data's been collected. And having it just go straight into Pub/Sub. So, we're working on that. And it'll happen, some time.

YB: So before we dive more into the stack, can you just describe for the listeners, what happens when someone loads a page, an article, on New York Times, right now. Just like, front to back.

NR: Yeah. I'll do this fairly crudely because the process has become pretty complicated.

YB: I think I have it in my head, but you're going to describe it way better.

NR: Well, I don't know man. So, above the fold for the page gets rendered in Node. So, first, the query gets resolved. The initial page template gets rendered in Node and pushed out. And then we make a bunch of other secondary requests, start to pull content into it, which will all go through GraphQL. That particular piece of content, or set of content, or list of articles, or whatever, whatever. And it all gets composed. And then we fire off a million tags - events that we're collecting through Pub/Sub or third-party tags, for a million other reasons. And, that's pretty much it.

YB: So GraphQL, Node, and then React?

NR: Yeah.

YB: Okay, so once that loads on the page, then let's say something dynamic happens, like, you have the ability to bookmark something.

NR: In the apps you can bookmark, and I think probably on the web too. I don't do it, so I'm not quite sure about off the top of my head. But I think so yeah, you can save stuff for later.

YB: Okay. So that would all just be happening through GraphQL?

NR: I'm not certain that the save events go through GraphQL on their way to the back end. I'm not sure. We definitely try to read through GraphQL, and always write through GraphQL. Different events take different paths, it's either synchronous or asynchronous paths for the most part. But I can't recall, off the top of my head, where the save events go.

YB: Okay. So, homepage. Similar flow?

NR: Yeah, pretty much.

YB: Very, cool. So, you basically went from that looking like PHP loading up, plain old HTML, JavaScript to now, fully Node, React, GraphQL.

Web architecture doesn't fundamentally change, like ever. It's the same as it was in 1995, basically.

NR: Yeah, and you know, web architecture doesn't fundamentally change, like ever. It's the same as it was in 1995, basically. But, the biggest difference would actually be that server-side rendering in Node. Like that's a thing, that isomorphic stuff is like something we couldn't really do so well - like in a sophisticated way in the past. Now we can do it in a fairly sophisticated way. Is it worth it, or not? I'm not always completely sure, but I think we've gotten significant performance gains. You're trying to balance the performance gains doing things server-side versus the development needs of composing things on the client side. So, I think it's good to have all of those options.


React, GraphQL, and the Front End Evolution

YB: Can you walk us through the decision to adopt React? What did that look like? Was that your idea?

Out of the 3.0 front end frameworks, React was the one that people were excited about.

NR: So, I was a fan of React. And we had adopted it at Conde Nast prior. I didn't push it at all, but we ended up in the same place. So, I couldn't really walk you through the thought process in detail because the team found their own way there, and I was like, "Sounds good." It seemed like it made sense to me, so I didn't challenge anything or go deep on it.

But I think the feeling was out of the 3.0 front end frameworks, React was the one that people were excited about. And now I sense ... I mean, I haven't gone deep on it, maybe you guys know more than I do, you probably do for sure, but there's a little bit of another way of front end frameworks that are lighter weight. That are a lighter weight approach than React, that some people ... There's one, in particular, I keep hearing about lately.

YB: Preact?

NR: Not Preact, though that's kind of interesting.

YB: Vue.js?

NR: It might be. Might be Vue, yeah. And I think ... I'm like deeply suspicious of all front end frameworks.

YB: Disclaimer, yeah.

NR: I just ... it never seems like things actually get that much easier. That you really see the productivity gains that theoretically are the reason you're doing these things. That said, you've got to do something. And definitely, the solution is not to stay stuck in the past.

YB: Developer happiness.

NR: I try not to get overexcited when people are like, "Once we do React, it's gonna be great."

YB: So that went through ARB?

NR: Yeah.

YB: Okay. And then, you mentioned that Apollo, well GraphQL, was a particularly difficult choice and it's still in progress. Can you talk a little bit about that and the thinking behind adopting it?

When you change the schema, how do you minimize the pain of having to change the schema in many places simultaneously?

NR: Well, the choice of GraphQL wasn't super controversial. What was more controversial, was how we were gonna implement and then who was going to own it. And it was, did the front end team own it or do we look at it as platform and have a different own it than the people who are really in the front end. That's the way that we went, and it was partly for expediency. We had a great team in place that had been building a very similar product, and they just kind of took what they'd been doing and basically made it GraphQL.

At the heart of the matter was exactly what you were talking about, about how the places where the most work happens... how to create the most service areas so most people can do their work there. And it all comes down to the schema and schema transformations. And basically querying resolutions.

And so, that's the hard part always, in these kind of multi-tier architectures. When you change the schema, how do you minimize the pain of having to change the schema in many places simultaneously? Change APIs, change, you know, etc., etc. So that's the hard problem and what we sweated a lot over was sort of how schema changes would work, who was responsible for them, what the workflow would be, and also how much we would try to automate it so that you would make a change in one place and have it flow everywhere versus not try to create an incredibly complicated contraption and just deal with changing things in a few places at the same time. So that's been the hard part.

And the choice again, to implement Scala, was unpopular with many people on the team. But on the other hand, has been great in some respects. We had a great team, who, that's how they wanted to work. And they've done great work, and it's been rock solid. So that's on the plus side. Not everybody loves Node.

JC: Yeah. I mean, touching on ... the sprinkling of automation, did you ever find, or do you maybe have a classic example, in your head, where something was over-automated and it outreached its bounds? And something that was once a convenient tool has now become a hungry beast that people have to look out for?

The graveyard of test automation frameworks is a terrible, scary place.

NR: I think that happens a lot. I'd have to think carefully about what the best example is. But I think that's a common pattern. You certainly see it a lot in testing. The graveyard of test automation frameworks is a terrible, scary place. I think it happens a lot in DevOps too, where it's easy to over-tool. But, we certainly ran the risk, I think, of going down that path for how we were managing schema changes and we pulled back from automating everything. And time will tell whether that was really the right call, but it feels like we were heading potentially towards making ... the problem with that kind of automation is it becomes very brittle. And then lots of things depend on this one thing behaving a certain way. So then it's hard to change lots of things. Maybe keep a little more give between parts of the system, and sometimes it works out better.

JC: Of course.

YB: So coming back to GraphQL, how does Apollo play into this? And how do you manage things, like someone wants to implement a schema change on the GraphQL side, what does that look like, now?

NR: So, now it's a workflow. Which is a little bit of a shame, but it is. Because we do have to make changes in a couple of places, so I'm going to probably not be able to give justice to the details. And if any of my people listen to this, they'll bang on my door and yell at me.

YB: We can always make edits.

NR: But basically it's workflow and a change to the GraphQL schema has to be made, a change to our back end schema has to be made as well - you know, the Kafka messages and so on. So I can't even remember now if we're using Protobuf. I think it's Protobuf for our message passing. So that has to be changed and then that's pretty much it.

And then if there's a major change, another resolver has to be written for the GraphQL server, and that's ... we're tinkering with opening that piece up to have more people on the product teams able to make the GraphQL schema change and change the resolvers as necessary. But, there's still kind of a work list to make changes to the Protobuf schema.

And then there's, if it touches Scoop in the back end CMS, then the whole, potentially the UI has to change, as well. If a field has to be added or something, sometimes it morphs into a bigger change, so that's another piece of the workflow, if that's necessary.

YB: And then Apollo, how does that play into this?

NR: So, Apollo, I haven't spent a lot of time studying Apollo. I don't really know exactly what Apollo is. I guess it's an implementation of React. Or it's like a ... Into what that means exactly, I'm not totally sure. Cause we started with React normal and at some point we switched to Apollo on the front end, so that's ... We have a good blog post about it, which I skimmed...

YB: Which we will absolutely link to.

NR: So, yeah, I'm not deep on that particular spot.

YB: Okay, another ARB matter, it sounds like.

NR: And actually the smart people on the front end teams made that call, and they ran it through the ARB. And I'm all about the surface. That's the way it goes. ARB doesn't typically come up with the plans, they just critique them basically.

KR: And then, terms of GraphQL, do you foresee a future at all, where maybe there would be some components that are more GraphQL dependent and there's still some leftover rest APIs?

NR: Yeah, I definitely do. My view is ... I mean, the whole point of doing GraphQL is to handle the problems like hydration and filtering. And for a complex content schema, that's going to have a lot of references to make it easy to write queries to get just the pieces that you need. And that's not necessary for everything. So I think, absolutely. We do today and we'll always have more straightforward REST APIs for things that aren't really integrated into that schema.

YB: So, lessons learned with I guess, the React shift. Anything that you would caution against or things that you wished you had known going into that migration path? About, just how you architected your front end and how you went about implementing?

NR: I'll say, not yet. My fear with front end frameworks is always over-engineering. So, we might decide at some point that actually React was too heavyweight for what we really needed to do and that we should've used Vue, which hadn't really quite emerged yet. Or you know, whatever lighter weight framework that's going to come along that we're going to eye with envy, perhaps.

I kind of feel like, it's never a perfect world and that was a good case of making a decision that was a solid consensus decision, and there was a lot of support and everything. Like, it wasn't a bad call - never will become a bad call at the time. But I worry about over-engineering front end frameworks. And I kind of feel like no matter what you do, front end development is always a pain and it's kind of the same ways. That there's no magic bullet, anyway.

YB: Yeah, cause I mean, obviously folks can hear this and say, well the majority of the pages are static so, you know, maybe there isn't a huge need for it. But if you take into account this whole idea of enhancing the experience by showing you things that you may be interested in, right, then you understand, it's like okay, now that these dynamic components are coming in, it makes sense.

NR: Yeah, and maybe we'll look back and say, "Eh, we didn't build as much sort of dynamic stuff as we thought we would, so we didn't really need this framework." You know it's hard to say, but we'll see.


Serverless and the Future

YB: Okay. So, let's talk about serverless. You ... I have a quote here. "My goal is all new apps are serverless by 2019."

NR: Yeah, my team was really mad at me because I started saying that in interviews before I told anybody on the team.

YB: Oh, you didn't put it through ARB?

Why wouldn't we want to be starting all our new projects on serverless if we have the option to do that?

NR: I just kind of threw it out there and they were super pissed. But it's a loose goal, it's not, you know ... we've just done a lot of migration and I'm not going to turn around and say, "Hey, everyone, let's do it again." But what I am saying is ... I present it more as a theoretical argument. I think, I have a whole bunch of reasons for believing that serverless is a big deal. Tell me whether I'm wrong. If not, then why wouldn't we want to be starting all our new projects on serverless if we have the option to do that?

YB: So, before you dive into that, can you just, for the audience, define what serverless means to you. Because we were actually arguing about this earlier. Like well, there's a framework and the concept.

NR: It's not the framework. I'm definitely not talking about the framework. I'm talking about the concept. And to me, what it means, is any context where as a developer, you don't need to think about scaling and architecting for scaling. You don't need to think about availability and reliability from an architectural perspective. So, you know, you don't have to think about availability zones or having redundancy. That's all managed for you. And then, ideally, you also don't interact with an OS. That part, there are things that are close enough to serverless, where there's still a bit of a vestigial OS. That I'll say, "Okay, that's fine." But generally speaking you shouldn't be dealing with an OS. That's my definition.

A lot of things fit in that definition. CDNs fit in that definition. So, serverless started for me in 1999, with Akamai. And it really, you know, displaced a whole bunch of our server infrastructure. Any kind of managed service, like Cloud Spanner or BigQuery, also, fits the criteria. Then obviously, platform as a service, fits the criteria. Google App Engine, or Heroku.

YB: Yeah, I was just thinking of Heroku, but enabling, with auto-scaling enabled.

NR: Yeah, so I never was a Heroku user as a developer, so I can't preach about whether it fits my criteria closely or not so much. But Google App Engine does. The function as a service fits but I find people reduce serverless to function as a service a lot lately. They're like, "Oh, that's done." And I'm like, "No."

YB: So you're not saying that by 2019, everything should be on cloud functions?

I think cloud functions are a useful tool, but by no means a way to build meaningful applications.

NR: I think almost nothing will be on cloud functions. I think cloud functions are a useful tool, but by no means a way to build meaningful applications.

YB: Oh, okay.

NR: I'm happy to be persuaded otherwise by someone, but I just don't see that as being a good way to build apps. At least unless there is lots of evolution and lots of tooling, and basically came to look like something quite different.

YB: So, when you say "serverless by 2019," you're saying that there's going to be ... You're not saying that it's going to be a particular product or a particular service, you're simply saying there is going to be another layer of abstraction. Where we're not even talking about OS.

NR: Yeah, what I'm saying is, if there's a ... Actually, first thing I would say, if there's a good SaaS solution out there, use that. If there is a good managed service out there, which is almost the same thing, use that. If there isn't, but you can host your app on a serverless platform, that meets those criteria, use that. And then if not, do something instance based or container based.

YB: Gotcha. Okay. So you're saying more conceptual...

NR: And then maybe there's some cloud function stuff scattered in there to do specific things.

YB: Okay, so you're saying, conceptually, you're going to worry less and less about ops, and more of what you do is going to be at the application level.

NR: The goal, the benefits are, you know, first and foremost, there's a whole bunch of things we don't have to do. And that's like, think about scaling, architect for scaling, do the ops behind scaling, etc.

Number two, it ought to be more cost-effective in the long run. And that's not always true. It's not always even easy to tell today, whether one approach is more cost-effective than another. You can do serverless badly, and that's gonna have a lot more impact on your cost.

But, theoretically, I just feel like, who's going to do a better job of utilizing the underlying resources? Like, us? Or, you know, Amazon and Google? And the answer is Amazon and Google will. So, over time they oughta be able to run much more efficiently and they ought to pass the savings on to us. Which I believe they will, cause there's strong competitive dynamic. So, that's the economic argument.

The third argument, which is the least proven, is that it also ought to boost developer productivity and hopefully, happiness. Because it's just more constrained. You spend less time making lots of stack choices. You make a few, and then the rest is you kinda gotta live with it. So you're more constrained, so you ought to be able to ... as long as you can work within those constraints, you ought to be able to focus more on building your features, basically.

YB: So, for a new product that you're looking to launch, are you now encouraging folks internally to say, "Alright, try out Google Cloud Functions, as a first step," or...

NR: I would say, my advice, which everyone is free to ignore, is App Engine. I'd say look at App Engine first.

YB: App Engine?

NR: I'm not keen on function as a service. I think that's actually, I've quoted Deepak Singh, who's the product manager for the Elastic Container Service at AWS, where he said something like, "Lambda or cloud functions, in general, were like Perl." Like what Perl was back in the day. It's a great way to solve a particular problem, or glue stuff together. But again, not the way to build big apps.

So, I say Google App Engine. But what I think is happening is that the container services are moving towards being more and more fully managed. So, a fully managed GKE, where we don't have to define the cluster, or manually add nodes, or even auto-scale it. But it just works. Would be close enough. So, there's still some OS in there, but if it's super stripped down, it's like, "Okay, fine." And if someone else is patching it then at that point I don't even really care. The OS almost just becomes a construct for managing dependencies and that's probably fine.

YB: Right. So, App Engine or Container Engine, you're good with either one because they give a sufficient level of...

NR: Container Engine, not today, but where I think they're going. Don't quote me on that.

What I mean is this is not based on knowledge of say, Google's roadmap, but it's a hypothesis that GKE and sort of App Engine Flex will converge to some degree. And you'll have something that presents a container based interface like GKE, but is otherwise managed. So that's ... When we get there, that'll meet my criteria, but GKE doesn't today. That said, we run tons of stuff on GKE. That's our dominant way of doing things right now.


Shifting to Google Cloud

YB: Gotcha, okay. So maybe can talk a little about, your shift ... I mean, there's a whole set of blog posts about this, but maybe you could just touch on your shift to Google Cloud. Because you had your own data centers, you had some stuff on AWS. What did that decision look like?

For our consumer-facing products, we're going to use GCP first.

In many cases Google is going to prove to be a better engineering company than Amazon.

NR: So, it was a long decision and in the end, it's one that we've taken with eyes open and that we reserve the right to modify at any time. And to be clear, we continue to do a lot of stuff with AWS. But, by default, the content of the decision was, for our consumer-facing products, we're going to use GCP first. And if there's some reason why we don't think that's going to work out great, then we'll happily use AWS. In practice, that hasn't really happened. We've been able to meet almost 100% of our needs in GCP.

YB: App Engine?

NR: Yeah, yeah. Well, not necessarily App Engine, but GCP in general. So it's basically mostly GKE, we're mostly running stuff on Kubernetes right now.

YB: Oh, wow. Okay.

NR: But, we haven't really run into situations where we can't do something in GCP, so we go to AWS. We have a lot of historical stuff running on AWS, that we've not prioritized moving. We've prioritized getting out of our data centers and we've been happily leaving stuff in AWS. The other thing is, we don't like to run Oracle in GCP because we're basically ... well, it's not supposed to. You can't actually be licensed and supported. So we continue to run the Oracle that we have, which I'm ashamed to say, we still have some Oracle. That's all running in AWS.

YB: Are you lobbying Google to support that, or are you just trying to kill it off?

NR: I'd rather get rid of it. You know, I think my goal is to get rid of Oracle. And I don't mind saying so, either.

But the first thing we did with GCP was the data stuff. So we looked hard at what's on the BigQuery and we loved BigQuery, so we said, "Let's go all in on the data stuff." And that's worked out fantastically. So that builds our confidence in our core theory, which is that in many cases Google is going to prove to be a better engineering company, than Amazon. And that they're going to produce, it's going to take them longer, but they're going to produce better solutions and more integrated solutions that Amazon will. Who famously, and by strategic choice, has said: "We're going to do a million things at once and half of them are going to be redundant, and you're going to have to figure it out, but that's how we're going to move quickly."

So, I have full respect for Amazon, what they've done. They've invented the whole space, but we think ... We're willing to change the way we do things to accommodate to a cloud, so we'll do that to accommodate to a cloud that's better engineered and ultimately going to be superior in specific technical respects. We think that's what we're getting with Google, and so far that theory's been borne out.

YB: Yeah, and this is a common thing that, at least, we've started seeing on the StackShare side, is people will now look at Google Cloud and they'll say, "Oh, this whole BigQuery thing is really interesting." And so it starts off with a piece, and a lot of times it's BigQuery, and then it's like, "Oh, well what else you got?"

NR: Yeah, I think sometimes it's Firebase. You know people are pretty fond of Firebase. I think it will be Spanner, in the future. I think some people will be like, "I want Spanner, so I'm going to go there."

YB: Are you using Spanner?

NR: Not yet. We've just done our first kind of test, and we will be, basically.

YB: You could see just moving fully to Spanner?

NR: I don't know. I mean, the part that we don't really have our heads around is what the cost implication is, so we don't really have a feel for what it costs. But, everything else about it, we like what we see. So, yeah. We'll use it and then we'll get to understand the cost dynamics better, and then we'll see where it goes.

YB: And James, I know you guys obviously did a big migration to Google Cloud. Was it a lot of similar things? I mean, you cited IO, right, as the big thing for you all.

JC: Yeah, I mean, speaking very opinionated, when we took a migration ... Just straight up, turn this database on right here, turn this database on right there, give them the same data set, and the same queries, who is going to beat out who? And we did that with the exact same resources and I think, early 2017, it was Google who would beat them out.

Just based on a pure IO, who's going to give me more megabytes faster? And we made that migration. We are very much in still instant space. There's not a lot of challenge in that and we know how to do that and do it very well. But there's something to be said about a marketing page that has a very accurate and exact number. Saying, if you're giving us this many dollars, you get these many bytes. And being able to run that yourself, being able to talk to an account rep from Google, and them saying, "Yeah, it is accurate." And it should be. And then trying the other thing, and Amazon.

And again, very opinionatedly, getting a few dips, getting a few spikes, it does not instill confidence. Which is why we made the migration and until Amazon gives me something different, I'm not gonna turn my head.

YB: So lessons learned with Google Cloud? Are there things that you wish you had known going into the GCP? I mean, you basically rip ...You're shutting down all of your data centers, moving over to Google Cloud, there's gotta be some things in there that you were like, "Ah, geez, there are some things here that I would've loved to know upfront."

NR: I think there are, but it's really down in the details. I think we knew when we were getting into a product early and we expected that there would be issues. And there were issues, in some cases.

And we also, we probably underappreciated how different the networking paradigm is and how much we'd have to change. The way we thought about security and that ... I think we were just sort of unprepared for and a little naïve about, we should've known. But a lot of the documentation has not been great.

So, that all to me, comes with, you know, with a business that's evolving very rapidly and that's trying to catch up and is behind. And they're behind AWS in many respects. So, that was a gamble that we made. There were a lot of details that we didn't know. But overall, I think we knew what we were getting into and I think we've been pretty happy, you know. It's worked out. None of the issues have been severe.

YB: Gotcha. Okay, so coming back to that sort of example, when you're hitting the homepage, that's all container engine?

NR: Yeah.

YB: Gotcha.

KR: Were there any organizational hurdles in the transition from having your own data centers to now migrating all of that?

NR: Oh, yeah. That was big and continues to be. But it was quite disruptive. And in the process, the shape and sort of character of our ops team has changed significantly. That's a big topic, maybe for another day. But, yes, it's definitely changed the way that we do ops and think about ops and the kinds of skills we needed and more.

YB: Part Two.

JC: I have a very quick sentence to put on that. The way that my team is shaped, even though we're much smaller in scale, is you spend less time operating and more time engineering. I mean, there two words in the same title, but if you're constructing more architecture, it's much easier than, "Oh, wow. This disc is failing, let me put in a request to replace it."

NR: It's just more productive work.

JC: Absolutely.


Open Source and the Developer Community

YB: Awesome. Well, StackShare, being a developer community, one question here is about how you all view your developer community? You've got developers.NYTimes.com which houses your open source, houses your engineering blog, and the APIs, of course. How do you guys look at sort of building your developer community? And maybe, if we have time here, how do you view open source and is that also the review board? How does that happen?

I'm theoretically and philosophically willing to open source just about anything.

NR: Well, I'll say at first, we have some APIs out there, we haven't been paying much attention to them. I don't feel like we have a developer community around our APIs at this point. I'd like us to, if it's something that world wants. At some point, we'll get around to figuring out if it is something that world wants, but we're not gonna do it ... I think we first did it sort of out of vanity. It was like a thing everybody was doing, and we won't do that again, but if there's a real value for people who are doing things out there...

But for open source, our filter is the same. I'm theoretically and philosophically willing to open source just about anything. I don't view anything that we do as super proprietary. I don't view us, really ... I'd be more interested in giving back to the community than protecting some competitive advantage around the tech that we do.

So, that said, I don't want us to just open source junk, or everything, or things we're not going to maintain. So the filter is, is this going to be useful to somebody? And do we believe in it in the long run? If so, we'll open source it without hesitation. And that's just about like we've done something ... We benefit enormously from open source. If we do something that we think others can benefit from, we want to share.

The blog's been fun. I think part of being at the Times, there's a lot of developers who care, are interested in journalism and like to write. So for me, it's like giving everyone an outlet to express themselves a little bit and have some fun with it. And hopefully, people find it interesting. But that's what that's all about.

YB: Awesome. Alright. Well, we're definitely over time here. But this has been awesome. Thank you so much.

NR: Great, thank you.

For even more details The New York Times tech, check our their official engineering blog


If you liked this interview, Follow us on SoundCloud or subscribe via iTunes to catch future episodes. Subscribe to StackShare Weekly to keep up with the latest tools and tech stacks.