GitHub Reverses Course, Will Train AI on User Data

GitHub: We going to train on your data after all

In a move that has reignited tensions between open-source developers and the tech giants that depend on them, GitHub announced that beginning April 24, public repository data will be used to train its artificial intelligence models unless users explicitly opt out. The Microsoft-owned platform, home to over 100 million developers worldwide, quietly updated its terms of service to reflect the change, which reverses earlier assurances that user code would not be harvested for AI training without clear consent. The decision has sent shockwaves through the developer community, with many calling it a betrayal of trust.

Under the new policy, any code, documentation, issues, and discussions hosted in public repositories will be fair game for training GitHub's AI products, including its popular Copilot coding assistant. Users who wish to prevent their work from being used must navigate to their account settings and manually toggle a new opt-out switch. Privacy advocates have pointed out that the opt-out mechanism places the burden squarely on developers, many of whom may not even be aware of the policy change. Critics argue that an opt-in model would have been far more respectful of the community that built the platform into what it is today.

The backlash has been swift and fierce. Prominent open-source maintainers have taken to social media to voice their frustration, with some threatening to migrate their projects to competing platforms such as GitLab and Codeberg. Several developers have noted that even permissive open-source licenses were never intended to grant blanket permission for AI training, and legal experts say the move could face challenges under copyright law in multiple jurisdictions. The Free Software Foundation issued a statement calling the policy "a fundamental violation of developer autonomy" and urged GitHub to reconsider.

GitHub, for its part, defended the decision in a blog post, arguing that training on public code is essential to building the next generation of developer tools and that the opt-out mechanism provides adequate control. A spokesperson said the company is committed to transparency and will continue to engage with the community on AI governance. Nevertheless, the controversy underscores a growing rift in the tech industry over who truly owns the data that powers the AI revolution, and whether the developers who write the world's code will have any meaningful say in how it is used.