GitHub Copilot has changed how developers write their code. However, it can also create issues when it creates code similar to what’s already available in another public repository. In 2022, GitHub launched a feature that allowed users to block suggestions of matching public code automatically. According to a GitHub spokesperson, this system would trigger less than 1% of the time. But sometimes, developers may want to see what these code fragments are — either to use them (within the licensing restrictions set up by their companies) or to maybe use the entire library this snippet came from.
So to find a middle ground, GitHub today launched a private beta of a code referencing feature for GitHub Copilot that will give developers this choice. With code referencing turned on, Copilot won’t automatically block any matching code it generates but instead shows it to developers in a sidebar and lets them decide what to do with it. Over time, this feature will also come to Copilot Chat.
GitHub previewed this feature last November but it clearly took a while to release it.
As GitHub CEO Thomas Dohmke told me, Microsoft, GitHub and most Copilot enterprise customers were using the original blocking feature, but he also noted that it’s a bit of a blunt tool. “It gives you little control to decide for yourself whether you actually want to take that code and attribute it back to an open source license. It doesn’t actually let you discover that there might be a library that you could use instead of synthesizing code,” he told me. “It prevents you from exploring these libraries and submitting pull requests. You might be reproducing everything that already exists in some open source repo.”
Dohmke pointed out that this often applies to common computer algorithms, like sorting, which tend to exist in many different places. Now, developers can either reject the code, use it directly — assuming the library makes that possible — or have Copilot rewrite the code so it doesn’t match the original code anymore.
As of now, it’s not possible to only see results that match specific licenses, but the team is actively looking for feedback to see if that is a feature users are asking for.
“We’re letting people understand the match and then go on and explore or go and make the right decision,” Dohmke said. “I think it fills the gap that the original solution had.”
The code referencing feature also tends to fire more often when there isn’t a lot of context for Copilot to work with. When Copilot can see a lot of context from the existing code you are working on, it’s unlikely to produce a suggestion that matches public code. But when you’re just getting started, it’s significantly more likely to generate matching code.
At the core of this is a very fast search engine (GitHub says it wants to keep latency down to 10-20ms) that can quickly find the matching code and its license. As of now, the matching code snippets are listed in the order the search engine finds them. In its original announcement from last year, GitHub said that developers should have the “ability to sort that inventory by repository license, commit date, etc.,” so I expect it’ll add this functionality later.