Behavioral Dataset

The Value Distribution Dataset

Based on the GS segmentation report, showing where value concentrates across Steam players.

46,157 players|5 segments|Deterministic|Schema-ready

Overview

What You Get

A dataset of 46,157 Steam player profiles. Each record includes behavioral signals, quartile rankings, a deterministic segment assignment, and a standardized value proxy.

This is not only engagement data. It is structure: which players are economically different and why.

10

Player-level fields

5

Behavioral segments

46K+

Player profiles

3

Core signals

Player share ≠ Value share

Explorer + Collector = 27% of players → 62.5% of value

Player share
Value share
Majority of value sits here: Explorer + Collector

Structure

Schema

Ten player-level fields per record. All fields are typed, documented, and ready for direct integration.

FieldTypeDescriptionExample
user_idstringHashed unique identifier per playera3f8c1...
library_sizeintTotal number of title records owned on Steam247
total_hoursfloatTotal hours played across all owned title records4832.5
top_game_hoursfloatHours played on the single most-played game1247.3
top_game_ratiofloatShare of total hours on most-played title (0.0 to 1.0)0.258
estimated_total_spendfloatEstimated ownership value (library_size × $20 proxy)4940.0
engagement_quartileintQuartile rank by total hours (1 to 4)3
ownership_quartileintQuartile rank by library size (1 to 4)4
focus_quartileintQuartile rank by top title ratio (1 to 4)2
segmentstringDeterministic behavioral segment labelExplorer

Preview

Sample Data (Representative Rows)

Five representative rows, one per segment. All key fields included.

user_idlibrary_sizetotal_hourstop_game_ratioestimated_total_spendsegment
a3f8c1...16238410.3123240Core
b7d2e4...483120230.1289660Explorer
c1a9f3...4233460.639840Focused
d4e6b8...64828910.12712960Collector
e9c3a7...147002940Dormant

Taxonomy

Segment Definitions

Core

Moderate breadth, moderate depth, balanced attention. The visible baseline of a Steam audience. Reliable, but not where outsized value concentrates.

Explorer

High breadth, high depth, distributed attention. Invests broadly across many titles. Largest concentration of ownership value in the dataset.

Focused

Low breadth, high depth, concentrated attention. Commits intensely to few titles. High engagement density, low economic breadth.

Collector

Very high breadth, variable depth. Ownership-driven behavior. Highest per-player value. Buys broadly regardless of play intensity.

Dormant

Moderate to large libraries, zero recent hours. Previously active, currently inactive. Stored ownership value without recent engagement.

Definitions

Signal Definitions

Each behavioral signal documented with its measurement, range, and interpretation guidance.

SignalTypeRangeWhat It MeasuresInterpretation
library_sizeint1 to 15,000+Total title records owned on SteamOwnership breadth. Higher values indicate broader investment across the ecosystem.
total_hoursfloat0 to 50,000+Cumulative hours played across all title recordsEngagement depth. Total time invested, not recency.
top_game_ratiofloat0.0 to 1.0Share of total hours on single most-played gameAttention concentration. Low values indicate distributed attention. High values indicate focus on one title.
engagement_quartileint1 to 4Quartile rank by total hoursRelative engagement position within the dataset
ownership_quartileint1 to 4Quartile rank by library sizeRelative ownership position within the dataset
focus_quartileint1 to 4Quartile rank by top game ratioRelative attention concentration position
estimated_total_spendfloat$20 to $300,000+Ownership-value proxy (library_size × $20)Relative ownership value. Not actual revenue.

Applications

Buyer Applications

Six concrete B2B analysis workflows the dataset supports, each tied to a specific signal pattern and measurable research outcome.

ActionSignal PatternOutcome
Rank high-value player segments for launch-readiness analysissegment = Explorer or CollectorIdentify players who invest broadly across the ecosystem
Identify dormant ownership valuesegment = Dormant, library_size > 100Quantify stored value without treating inactivity as zero market relevance
Separate high-engagement players from high-ownership playersengagement_quartile = 4, top_game_ratio > 0.5Avoid using engagement depth as a proxy for ownership value
Analyze cross-title bundle adjacencylibrary_size > 200, focus_quartile ≤ 2Locate breadth-driven players who already concentrate ownership across adjacent titles
Score audiences against the segment taxonomy in your warehousejoin(player_id) → user_idReplace heuristic player tiers with deterministic, reproducible segments
Validate retention and LTV models on cross-library featureslibrary_size, total_hours, top_game_ratioReduce blind spots in models built only on first-party engagement

Methodology

How The Segments Are Built

Three primary signals, ownership breadth, engagement depth, and attention concentration, are computed per player. Each signal is then ranked into quartiles within the dataset.

Segment assignment is rule-based: deterministic combinations of the three quartile ranks map to one of five named segments. No clustering. No statistical model. The same input always produces the same output.

The estimated value proxy is library_size × $20, a conservative ownership baseline. It is not a revenue prediction. It is a comparative anchor for relative ownership value across players.

FAQ

Common Questions

Is this engagement data?

No. Engagement is one of three inputs. The dataset is a structural classification of who players are, not a real-time activity feed.

Is the value field actual revenue?

No. estimated_total_spend is a $20-per-game proxy. Use it for relative comparisons across players, not as a revenue forecast.

Are segments stable over time?

Yes. The rule-based mapping is deterministic. Re-running the same player against the same signals always produces the same segment.

Can we extend with our own signals?

Yes. The schema is open and joinable on user_id. Most buyers blend the GS segments with their first-party features.

This dataset is the layer behind the GS report.

27% of players. 62.5% of value.

Access is granted under a written data-use agreement. Samples may include a representative slice across all four layers depending on buyer use case, coverage requirements, and approval status.

We never sell or share this form data. See the Privacy page for details.