Our partners at GitGuardian have been scanning every single public commit made on GitHub for secrets since 2017, and they are now releasing their findings in the most comprehensive study on secrets sprawl ever conducted.
The community that has been built around GitHub, the Octoverse as it has become to be known, has been fundamental in changing how we use and build open-source components and software. Today there are more than 50 million developers using GitHub, 60 million repositories created in a single year and over 2 billion commits, the size of the Octoverse is outstanding.
GitHub today has become a place for developers to showcase their work and contribute to the millions of projects that form much of the building blocks modern software development is built upon. With such a vast resource of data publicly available, as you may imagine, there is also a huge number of sensitive data that is unknowingly or accidentally pushed to the platform, namely secrets like API keys, credentials and other digital authentication strings. These secrets can be used by attackers to gain access to infrastructure, systems and PII. When these secrets are distributed through multiple systems and services it creates a problem we collectively call secrets sprawl. Because code is so widely distributed through GitHub and because git keeps a complete record of a repository’s history, a public repository is arguably the worst place for a secret to end up.
How big of a problem is secrets sprawl on public GitHub? This has been very difficult to accurately quantify, until now!