1.2 KiB
1.2 KiB
Platform: GitHub
GitHub is a hosted Git platform providing repository hosting, issue tracking, pull requests, code review, and CI/CD. It exposes a comprehensive REST API for reading public repository data.
Available signals
- Commits authored
- Pull requests opened
- Pull request reviews submitted
- Issue comments
How to collect
GitHub REST API (v3). Public repository data requires no authentication.
Base URL: https://api.github.com
Example endpoints:
/repos/{owner}/{repo}/commits?author={username}— commits by user/repos/{owner}/{repo}/pulls?state=all— pull requests/repos/{owner}/{repo}/pulls/{pull_number}/reviews— PR reviews
Rate limits: 60 requests/hour unauthenticated. Authenticated requests (personal access token) raise this to 5,000/hour. The public-data-only principle constrains us to unauthenticated access; design the collector with appropriate delays.
General concerns
- The same person may use different GitHub usernames across organisations
- 60 requests/hour unauthenticated is a practical constraint at scale
- Commits vary widely in size; raw commit count is a blunt signal