Behavioral Dataset
Based on the GS segmentation report, showing where value concentrates across Steam players.
Overview
A dataset of 46,157 Steam player profiles. Each record includes behavioral signals, quartile rankings, a deterministic segment assignment, and a standardized value proxy.
This is not only engagement data. It is structure: which players are economically different and why.
Player-level fields
Behavioral segments
Player profiles
Core signals
Explorer + Collector = 27% of players → 62.5% of value
Structure
Ten player-level fields per record. All fields are typed, documented, and ready for direct integration.
| Field | Type | Description | Example |
|---|---|---|---|
| user_id | string | Hashed unique identifier per player | a3f8c1... |
| library_size | int | Total number of title records owned on Steam | 247 |
| total_hours | float | Total hours played across all owned title records | 4832.5 |
| top_game_hours | float | Hours played on the single most-played game | 1247.3 |
| top_game_ratio | float | Share of total hours on most-played title (0.0 to 1.0) | 0.258 |
| estimated_total_spend | float | Estimated ownership value (library_size × $20 proxy) | 4940.0 |
| engagement_quartile | int | Quartile rank by total hours (1 to 4) | 3 |
| ownership_quartile | int | Quartile rank by library size (1 to 4) | 4 |
| focus_quartile | int | Quartile rank by top title ratio (1 to 4) | 2 |
| segment | string | Deterministic behavioral segment label | Explorer |
Preview
Five representative rows, one per segment. All key fields included.
| user_id | library_size | total_hours | top_game_ratio | estimated_total_spend | segment |
|---|---|---|---|---|---|
| a3f8c1... | 162 | 3841 | 0.312 | 3240 | Core |
| b7d2e4... | 483 | 12023 | 0.128 | 9660 | Explorer |
| c1a9f3... | 42 | 3346 | 0.639 | 840 | Focused |
| d4e6b8... | 648 | 2891 | 0.127 | 12960 | Collector |
| e9c3a7... | 147 | 0 | 0 | 2940 | Dormant |
Taxonomy
Moderate breadth, moderate depth, balanced attention. The visible baseline of a Steam audience. Reliable, but not where outsized value concentrates.
High breadth, high depth, distributed attention. Invests broadly across many titles. Largest concentration of ownership value in the dataset.
Low breadth, high depth, concentrated attention. Commits intensely to few titles. High engagement density, low economic breadth.
Very high breadth, variable depth. Ownership-driven behavior. Highest per-player value. Buys broadly regardless of play intensity.
Moderate to large libraries, zero recent hours. Previously active, currently inactive. Stored ownership value without recent engagement.
Definitions
Each behavioral signal documented with its measurement, range, and interpretation guidance.
| Signal | Type | Range | What It Measures | Interpretation |
|---|---|---|---|---|
| library_size | int | 1 to 15,000+ | Total title records owned on Steam | Ownership breadth. Higher values indicate broader investment across the ecosystem. |
| total_hours | float | 0 to 50,000+ | Cumulative hours played across all title records | Engagement depth. Total time invested, not recency. |
| top_game_ratio | float | 0.0 to 1.0 | Share of total hours on single most-played game | Attention concentration. Low values indicate distributed attention. High values indicate focus on one title. |
| engagement_quartile | int | 1 to 4 | Quartile rank by total hours | Relative engagement position within the dataset |
| ownership_quartile | int | 1 to 4 | Quartile rank by library size | Relative ownership position within the dataset |
| focus_quartile | int | 1 to 4 | Quartile rank by top game ratio | Relative attention concentration position |
| estimated_total_spend | float | $20 to $300,000+ | Ownership-value proxy (library_size × $20) | Relative ownership value. Not actual revenue. |
Applications
Six concrete B2B analysis workflows the dataset supports, each tied to a specific signal pattern and measurable research outcome.
| Action | Signal Pattern | Outcome |
|---|---|---|
| Rank high-value player segments for launch-readiness analysis | segment = Explorer or Collector | Identify players who invest broadly across the ecosystem |
| Identify dormant ownership value | segment = Dormant, library_size > 100 | Quantify stored value without treating inactivity as zero market relevance |
| Separate high-engagement players from high-ownership players | engagement_quartile = 4, top_game_ratio > 0.5 | Avoid using engagement depth as a proxy for ownership value |
| Analyze cross-title bundle adjacency | library_size > 200, focus_quartile ≤ 2 | Locate breadth-driven players who already concentrate ownership across adjacent titles |
| Score audiences against the segment taxonomy in your warehouse | join(player_id) → user_id | Replace heuristic player tiers with deterministic, reproducible segments |
| Validate retention and LTV models on cross-library features | library_size, total_hours, top_game_ratio | Reduce blind spots in models built only on first-party engagement |
Methodology
Three primary signals, ownership breadth, engagement depth, and attention concentration, are computed per player. Each signal is then ranked into quartiles within the dataset.
Segment assignment is rule-based: deterministic combinations of the three quartile ranks map to one of five named segments. No clustering. No statistical model. The same input always produces the same output.
The estimated value proxy is library_size × $20, a conservative ownership baseline. It is not a revenue prediction. It is a comparative anchor for relative ownership value across players.
FAQ
No. Engagement is one of three inputs. The dataset is a structural classification of who players are, not a real-time activity feed.
No. estimated_total_spend is a $20-per-game proxy. Use it for relative comparisons across players, not as a revenue forecast.
Yes. The rule-based mapping is deterministic. Re-running the same player against the same signals always produces the same segment.
Yes. The schema is open and joinable on user_id. Most buyers blend the GS segments with their first-party features.
This dataset is the layer behind the GS report.
27% of players. 62.5% of value.