How Codacy Analyzes 30 Billion Lines of Code Per Day

2,055
Codacy
Codacy automates code reviews and monitors code quality on every commit and pull request. It reports back the impact of every commit or pull request in new issues concerning code style, best practices, security and many others. It monitors changes in code coverage, code duplication and code complexity. It allows developers to save time in code reviews and tackle efficiently technical debt.

Editor's Note: Jaime Jorge is co-founder and CEO at Codacy.

Codacy helps dev teams of all sizes to automate their code quality by identifying issues through static code analysis, both in the cloud and on-premise. The product notifies users about security issues, code coverage, code duplication and code complexity in every commit and pull request, directly from their current workflow. We sat down with Jaime to learn more about the technology behind Codacy's automated code review platform.

StackShare: Why did you and your other co-founder create Codacy?

Jaime Jorge: Being both developers, we started the company because we wanted to help developers focus on software development instead of just fixing code. I was researching this topic for my master's thesis (working with Telcos in Europe) to understand technical debt (in terms of code duplication), and Joao (my co-founder) was leading tech teams in the financial industry in the UK. What brought us together was the mission of helping as many developers and companies as we could to ship better code and increase their productivity.

Founded in 2012, Codacy now employs 40 people (more than half of which are technical) between our offices in Lisbon and NYC.

StackShare: Out of the 28 supported languages, which one do you see used the most on your platform?

JJ: The usage distribution of our supported programming languages follows what you’d expect to see looking at indexes/ranks like the one from TIOBE. The most used language in Codacy is Javascript. This is a result of a strong clustering of web development use cases. We then see Java, Python, Ruby and a few others close behind.

StackShare: It’s amazing how small your team is yet you support so many different languages.

JJ: When we started Codacy, we only supported Scala (on which our product is built). Following requests from new users over time, we started adding additional language support. We understood that modern development does not rely on one programming language alone, and modern tech stacks most often have a combination of many different languages. This forced us to create a platform that would make it easy for us to add new programming languages but also update their support. We also allowed for our users to bring their own support by exposing our integration mechanism.

StackShare: How do you use Codacy to build Codacy?

JJ: Our team uses Codacy every day, primarily to maintain the same criteria of development (formatting, coverage, best practices) across the different dev squads. There are features we use more often than others, which mirrors what we see from our customers.

StackShare: Which features do your team use most often?

JJ: Some team members like to use the dashboards to keep track of the main quality metrics, some like the build status we provide to make sure we’re within the defined criteria. All of the team uses the auto-comment feature, which helps our teams stay in-touch.

StackShare: What platforms do you integrate with?

JJ: Our most popular integrations are with GitHub, GitLab, Bitbucket, CircleCI, Jenkins, and Slack, although we support many others.

StackShare: How does Codacy provide notifications for security issues?

JJ: As part of our code analysis, we provide security notifications via the tools we integrate with.

StackShare: Tell us about your secure development practices?

JJ: We develop following security best practices and frameworks (OWASP Top 10, SANS Top 25). Our developers participate in regular security training to learn about common vulnerabilities and threats, and we review our code for security vulnerabilities. We also regularly update our dependencies and make sure none of them has known vulnerabilities.

Our teams use Static Application Security Testing (SAST) to detect basic security vulnerabilities in our codebase, and Dynamic Application Security Testing (DAST) to scan our applications.

StackShare: What’s the biggest issue new developers make when setting up an automated code review system?

JJ: Incorrect or incomplete configuration.

StackShare: How many automated code reviews do you process daily?

JJ: We pull about 8TB per day which, assuming 1 byte per character and 256 characters per line, we arrive at ~ 3*10^10 lines (about 30 billion). Interesting to note, this is about 40% of the text content in the Library of Congress (according to wolfram alpha)

StackShare: How do you store all of that data?

JJ: All of our services run in the cloud on AWS. We don’t host or run our own routers, load balancers, DNS servers, or physical servers.

StackShare: What AWS services do you use specifically for getting that data processed, indexed and stored?

JJ: Data is processed using EC2 instances. We currently run our applications using Docker on Elastic Beanstalk, but we are transitioning to EKS. The data is stored on RDS, where we use both Aurora and Postgres. Although the volume of data we pull to analyze is 8TB, the analysis results (that we actually store) are significantly smaller. You don’t need the code verbatim for every source file - you just store the issues and where in the file you found them. We then leverage AWS to scale elastically (e.g. the number of active analysis servers) with the current load.

StackShare: Does this process still involve Scala or another language?

JJ: Our applications are all implemented in Scala. They do all the heavy lifting regarding data processing/indexing.

StackShare: How long do you retain that data?

JJ: The repositories are cloned, analyzed and then deleted.


Thanks for reading! If you use Codacy you can add them to your stack here.

Codacy
Codacy automates code reviews and monitors code quality on every commit and pull request. It reports back the impact of every commit or pull request in new issues concerning code style, best practices, security and many others. It monitors changes in code coverage, code duplication and code complexity. It allows developers to save time in code reviews and tackle efficiently technical debt.
Tools mentioned in article
Open jobs at Codacy
Technical Support Engineer
Lisboa, Portugal

Codacy builds the leading code quality platform that helps thousands of developers ship billions of lines of code per day. We see a world where everyone can craft complex software with confidence and focus on impacting the world at the speed of thought. 

We are a small team of highly dedicated and ambitious people. We are curious, funny, radically honest yet kind, and we thrive on collaboration and transparency. Our main focus is on creating value for our customers.

Whether you’re skilled in building, selling, marketing or supporting, we want you to help us change the developer tools industry.

We are looking for a Technical Support Engineer who will play a key role in developing and maintaining a strong customer perception of support quality, bringing customer and support feedback into product, and working with engineers in the escalation of bugs and complex issues.

Codacy users reach out to the support team for all general and product related questions.You will help solve technical issues, drive smooth adoption of Codacy and ensure optimal customer experience when they reach out.


What will be your day-to-day?

  • Respond to, replicate, troubleshoot and solve technical issues via chat, email and video calls
  • Collaborate with product and engineering teams in triaging and prioritising of new features and getting bugs fixed
  • Create and update internal and customer facing documentation based on customer interactions
  • Manage the ticket queues on multiple CRMs (Intercom and Zendesk)
  • Help hire and train new Support team members
  • Help team members stay up to date on product knowledge and answer their technical questions 
  • Suggest and implement process improvements to improve the support workflow

What are the skills needed to do the job successfully?

  • Ability to debug, triage and solve technical issues and summarise all the steps along the way
  • Experience with and affinity for providing support to customers and solving technical problems
  • Ability to communicate complex technical topics simply and clearly in written and spoken English with customers in writing and on video calls
  • Ability to maintain clear, concise, and positive communication for all cases in a timely and efficient manner including follow-ups with customers, team members, and engineers
  • Experience/familiarity with development workflows and programming concepts
  • Willingness to teach and learn
  • Prioritise work under stress (time management), creative thinking/problem-solving 
  • Git experience/familiarity preferred 
  • Ability to provide ideas and assist in the creation of documentation and training material for external and internal support content. Experience writing support content preferred.
  • Experience with support platforms (e.g. Zendesk, SalesForce, Intercom etc.) preferred

What else makes working at Codacy awesome?

  • Competitive Salary. Check our our salary calculator at https://www.codacy.com/careers
  • Comprehensive health insurance for household members, with dental and vision. 
  • Snacks & Drinks in the office everyday
  • Regular compensation reviews
  • Generous learning and development budget 
  • Pet-friendly offices 
  • Flexible holidays
  • Flexible working hours 
  • Remote work
  • Regular team gatherings
You may also like