Good first issues was brought to GitHub on the web back in May 2019. It was designed to highlight recommendations on issues applied by project managers. The company says it updated the tool in December with more powerful AI algorithms. Now, the good first issues features surfaces issues in 70% of recommended repositories. In a blog post, GitHub senior machine learning engineer Tiferet Gazit, the company developed a list of 300 labels across popular open projects. However, this list of label names only surfaced problems from 40% of recommended repositories. Furthermore, those who maintain those open projects were still having to label problems manually. With the new AI recommendation algorithm functions mostly automatically without input from project maintainers. “here is a tradeoff between coverage and accuracy, which is the typical precision and recall tradeoff found in any ML product. To prevent the feed from being swamped with false positive detections, we aim for extremely high precision at the cost of recall. This is necessary because only a tiny minority of all issues are good first issues.”
Weighted Measurement
In use, the new AI tool on GitHub can predict the probability over the requirement for a recommendation. This probability is weighted with a confidence score equal to the probability. “To surface issue recommendations given a trained classifier, we run inference on all qualifying open issues from non-archived public repositories. Each issue for which the classifier predicts a probability above the required threshold is slated for recommendation, with a confidence score equal to its predicted probability.” These issues are given a confidence score based on the relevance of their labels, with synonyms of “good first issue” given higher confidence than synonyms of “documentation”.”