35 lines
1.2 KiB
Markdown
35 lines
1.2 KiB
Markdown
# Platform: GitHub
|
|
|
|
GitHub is a hosted Git platform providing repository hosting, issue tracking, pull requests, code review, and CI/CD. It exposes a comprehensive REST API for reading public repository data.
|
|
|
|
---
|
|
|
|
## Available signals
|
|
|
|
- Commits authored
|
|
- Pull requests opened
|
|
- Pull request reviews submitted
|
|
- Issue comments
|
|
|
|
---
|
|
|
|
## How to collect
|
|
|
|
GitHub REST API (v3). Public repository data requires no authentication.
|
|
|
|
Base URL: `https://api.github.com`
|
|
|
|
Example endpoints:
|
|
- `/repos/{owner}/{repo}/commits?author={username}` — commits by user
|
|
- `/repos/{owner}/{repo}/pulls?state=all` — pull requests
|
|
- `/repos/{owner}/{repo}/pulls/{pull_number}/reviews` — PR reviews
|
|
|
|
**Rate limits:** 60 requests/hour unauthenticated. Authenticated requests (personal access token) raise this to 5,000/hour. The [public-data-only principle](../../decisions/003-public-data-only.md) constrains us to unauthenticated access; design the collector with appropriate delays.
|
|
|
|
---
|
|
|
|
## General concerns
|
|
|
|
- The same person may use different GitHub usernames across organisations
|
|
- 60 requests/hour unauthenticated is a practical constraint at scale
|
|
- Commits vary widely in size; raw commit count is a blunt signal
|