AI generates 25% of Google’s new code. Other organizations seek similar insights to mitigate the risks of this new age of AI-driven development.
AI-powered coding tools are transforming the software development landscape, making them more essential than ever. Google, a leader in AI adoption (and creator of Gemini Code Assist), has set a benchmark: AI systems now generate over 25% of new code for Google’s products. This revelation, shared by CEO Sundar Pichai, underscores the strategic value of tracking AI’s impact on productivity, quality, and efficiency—insights that drive Google’s AI investments and decision-making.
But not every organization is Google. Most companies lack the internal infrastructure to capture such detailed metrics. As a result, they struggle to quantify how much code AI tools generate and how it may influence their codebase, both now and in the future.
Fortunately, incorporating data directly from the development environment can fill this gap, allowing a broader range of companies to track AI-generated contributions effectively.
Understanding the difference between human and AI-generated code isn’t just about curiosity; it's crucial to navigating the modern software development landscape.
Inevitably, AI adoption will only increase, bringing many blessings but potentially some curses. Without proper tracking and understanding of AI’s role in the development process, companies could find themselves dealing with the fallout of new technical debt or vulnerabilities, both accumulated silently over time.
By maintaining visibility into the use and impact of AI-generated code, engineering teams can proactively manage and respond to changes in behavior, ensuring that their codebases remain robust and predictable.
There are several reasons why telling when code is AI-generated is important.
As AI tools continue to play a bigger role in development, developers need to monitor their reliance on these tools to ensure they're not losing essential coding skills.
Having visibility into their own AI usage—compared to peers—allows individuals to gauge their progress and adjust as needed. This insight helps them stay effective at reading, understanding, and troubleshooting AI-generated code, maintaining their capability as skilled engineers even in an AI-augmented environment.
Balancing AI efficiency with core coding skills is crucial for both personal growth and professional effectiveness.
The challenge of identifying AI-generated code lies in the complexity of modern coding practices. Developers are no longer limited to manually typing every line of code; instead, they draw on a variety of tools and resources:
The prevalence of these tools and resources creates a challenge for accurately determining how much of the codebase is AI-generated.
Coding assistant vendors can only provide statistics about their specific service, showing how often developers accept suggestions or utilize AI-generated snippets. But they lack visibility into what developers do outside of their platforms—whether they use other coding aids, search online for examples, or incorporate open-source code.
Instrumentation of the developer's environment is essential to accurately determining the ratio of AI-generated code to human-written code.
By capturing data directly from the development process, it's possible to get a holistic view of all code contributions, whether they come from coding assistants, traditional autocomplete tools, manual typing, or external sources. This holistic approach provides the visibility needed to understand AI’s true impact on the software development workflow.
Only a few modern coding assistants offer APIs that provide a glimpse into their usage—and when they do, it’s typically in aggregate across the entire engineering organization or sub-group.
Coding assistants provide:
While these statistics are useful, they leave significant gaps in understanding how AI is transforming software development:
These limitations mean that relying solely on coding assistant APIs gives an incomplete view of AI’s role in software development. They focus on aggregated metrics without shedding light on the detailed nuances of AI’s contributions. For example, while acceptance rates can indicate that developers find certain AI suggestions useful, they don't distinguish between trivial suggestions like formatting or documentation and critical code logic.
To fully understand AI's impact on software development, collecting data directly from the developer's environment is key.
Gathering data in the IDE with a VScode extension can fill the gaps and offer a more comprehensive view of how AI is being integrated into coding workflows. Here's how tracking AI usage in the IDE can overcome the limitations of coding assistant APIs:
Data collected directly in the IDE allows organizations to capture how code is being written as it happens. Unlike metrics from coding assistant vendors, which are often delayed and retrospective, IDE-based data reflects real-time AI usage. This allows for immediate insights into which parts of the code are being generated by AI tools, when AI is used, and to what extent.
By tracking AI usage directly in the IDE, developers can gain real-time feedback about their coding practices. They can see how often they rely on AI-generated code, what types of code are AI-assisted (e.g., logic, documentation, or tests), and where AI tools contribute to their work. This helps developers understand how AI is influencing their coding habits and allows them to adjust their workflows accordingly.
As code changes are made and pull requests (PRs) are submitted, IDE-based data can annotate the PR with metadata about AI involvement. This allows reviewers to understand the proportion of the code that was generated by AI, offering valuable context for the review process. For example, if a pull request contains a significant amount of AI-generated content, reviewers may want to pay closer attention to ensure the quality and security of the code. This context helps engineering leaders make more informed decisions about when to apply additional scrutiny.
IDE-based data collection can also be aggregated and analyzed at a macro level across the organization. This allows for insights into broader trends, such as:
Gathering data directly in the IDE makes it far easier to tell when code is AI-generated. It provides actionable insights that go beyond the high-level metrics from coding assistant APIs, helping to identify patterns and trends as they emerge. This data is crucial for mitigating risks, such as accumulating technical debt or introducing security vulnerabilities, and ensures that AI use in development is closely monitored and managed.
With this complete picture, organizations can make informed decisions on when to apply more scrutiny to AI-generated content, adjust code review processes, and introduce policies to prevent the uncontrolled accumulation of AI-driven changes. By having this information at their fingertips, engineering leaders can stay ahead of potential issues and ensure their codebase evolves in a controlled, secure, and efficient way.
If you're ready to gain deeper insights into AI's role to anticipate risks in your development process and avoid surprises in your codebase, the Faros AI VSCode extension is a great place to start.
Bonus: If you use Faros AI to visualize AI's impact on productivity, you can also centralize this data as part of your more holistic analytics.
Get started with the Faros AI VSCode copilot extension.
Global enterprises trust Faros AI to accelerate their engineering operations. Give us 30 minutes of your time and see it for yourself.