Welcome to NWSL Notebook
A new way to explore women's soccer. Ask questions in plain English and get real answers about the National Women's Soccer League.
What Is This?#
NWSL Notebook lets you have a conversation with data. Instead of digging through spreadsheets or learning complex tools, you simply ask questions like you'd ask a friend who happens to know everything about women's soccer.
Want to know who's been the best midfielder this season? Curious how your favorite team stacks up against the rest of the league? Just ask.
The AI assistant understands your question, finds the answer in our database, and shows you the results. No coding required. No statistics degree needed.
Who Is This For?#
Fans — Settle debates, discover new players, and understand the game on a deeper level.
Journalists — Find stories in the data. Get stats for your articles without filing requests or building spreadsheets.
Fantasy Players — Make smarter picks with real performance data beyond goals and assists.
Coaches and Analysts — Quick answers when you need them, with the option to see the underlying work.
Anyone Curious — If you've ever wondered "who's really the best?" or "how does she compare to...?" — this is for you.
How It Works#
-
Ask anything — Type your question in the chat sidebar. Use natural language, like "Who has the most assists this season?" or "Compare Sophia Smith to Mallory Swanson."
-
Get answers — Results appear as cards on your canvas. Tables, charts, and insights — whatever best answers your question.
-
Go deeper — Ask follow-up questions. "Now show me just Portland players" or "What about last season?" The conversation continues.
Understanding the Numbers#
Soccer analytics can feel overwhelming, but the core ideas are simple. Here's what the main metrics actually mean:
VAEP (Player Value)#
VAEP answers the question: "How much did this player help their team score (or stop the other team from scoring)?"
Every touch of the ball either helps or hurts your team's chances. A pass that opens up space? Helpful. Losing the ball in a dangerous area? Harmful. VAEP adds up all these moments to show a player's true impact — not just goals and assists, but everything.
Why it matters: Two players might both score 5 goals, but one might be creating chances for teammates, winning the ball back, and moving play forward. VAEP captures the full picture.
Expected Goals (xG)#
When a player takes a shot, how likely is it to go in? A shot from 6 yards out with no defenders is much easier than a long-range effort through traffic.
xG measures shot quality. A player with 5 goals from 3 xG is a clinical finisher — they're beating the odds. A player with 5 goals from 8 xG might be getting unlucky or missing good chances.
Why it matters: Goals can be fluky. xG shows whether a player is truly dangerous or just riding a hot streak.
Expected Assists (xA)#
xA gives credit to playmakers. If a player makes a perfect pass that sets up an easy chance, they deserve recognition — even if the shooter misses.
Why it matters: Great passers don't always show up in the assist column because assists depend on teammates finishing. xA reveals who's really creating chances.
Progressive Actions#
These are passes and carries that move the ball significantly toward the opponent's goal. They identify the players who make things happen — advancing the ball into dangerous areas.
Why it matters: Some players rack up stats by playing it safe. Progressive actions show who's actually pushing the team forward.
The Data#
We track every meaningful moment from NWSL matches: every pass, shot, tackle, dribble, and interception. Each event includes where it happened on the field and the game situation at the time.
This isn't a sample or estimate — it's comprehensive match data processed through professional-grade analytics models.
Coverage includes recent NWSL seasons with data updated regularly throughout the current season.
Under the Hood: Our Models#
The numbers you see aren't simple counts — they come from machine learning models trained specifically on NWSL data. Here's a peek at what powers the analysis:
VAEP Model#
What it does: Predicts how each action on the ball changes the probability of a goal being scored.
How it learns: We fed the model over 2 million actions from every NWSL match going back to 2016. For each action, it learned what happened next — did the team eventually score? Did they concede? Over time, the model figured out which types of actions in which situations tend to lead to goals.
The algorithm: CatBoost, a gradient boosting model. Think of it as a system that makes thousands of small decisions in sequence, each one learning from the mistakes of the previous ones. It's the same family of algorithms used by top tech companies for prediction tasks.
Why it's special: The model looks at around 100 different factors for each action — where it happened on the field, what type of action it was, what happened in the previous few seconds, and more. This lets it understand context. A pass in your own penalty area means something very different from the same pass near the opponent's goal.
Expected Goals (xG) Model#
What it does: Predicts the probability that a shot will result in a goal.
How it learns: Every shot in our database has an outcome — goal or no goal. The model learned which shot characteristics (distance, angle, body part, game situation) correlate with scoring.
The algorithm: A Bayesian statistical model that accounts for individual differences. Some players are better finishers than others. Some teams create better chances. The model learns these patterns and adjusts accordingly.
Why it's special: Unlike simple distance-based calculations, our xG model incorporates player and team effects. It knows that a shot from Sophia Smith isn't the same as the identical shot from a different player.
Expected Threat (xT) Grid#
What it does: Assigns a "danger value" to every zone on the pitch based on how likely possessions in that area lead to goals.
How it learns: By tracking millions of ball movements, the model maps out the probability of scoring from each of 96 zones on the field. Zones near the goal are more dangerous. Zones in the corners are safer.
The algorithm: A Markov chain — a mathematical model that tracks how the ball moves from zone to zone and what typically happens next. It's the same type of model used in everything from Google's search rankings to predicting weather patterns.
Why it's special: xT captures the value of moving the ball forward even when it doesn't directly lead to a shot. A midfielder who consistently advances play into dangerous areas shows up in xT even if the final pass comes from someone else.
Tactical Pattern Recognition#
What it does: Groups similar attacking sequences together to identify playing styles.
How it learns: The model breaks each match into "phases" — continuous stretches of possession — and looks for patterns. Teams that play lots of short passes in the opponent's half look different from teams that launch long balls forward.
The algorithm: K-means clustering, which groups similar things together. Given 20 buckets, the model figures out the 20 most distinct types of attacking plays and assigns each possession phase to one of them.
Why it's special: This lets us answer questions like "Which teams play most like Portland?" or "How has this team's style changed over the season?"
The Training Data#
All of these models are trained on comprehensive NWSL match data:
- 2.9 million individual actions (passes, shots, tackles, dribbles)
- 1,300+ matches across 13 seasons (2013-2025)
- 895 unique players
- Every action tagged with location, timing, and outcome
The methodology follows academic research from KU Leuven's sports analytics group, particularly the work of Tom Decroos on valuing actions in soccer. We've adapted their frameworks specifically for the women's game.
Tips for Better Questions#
Be specific about time
Who led the league in assists in 2024?
Show me Sophia Smith's stats from the last 5 games
Set minimums for fair comparisons
Top 10 scorers with at least 500 minutes played
Best passers among players with 10+ appearances
Ask for visuals
Plot goals vs xG for all forwards
Chart the top 10 teams by possession percentage
Build on your results
Now break that down by team
Add assists to that table
Example Questions to Try#
Who are the top 5 goal scorers this season?
Which team has the best defense?
Show me Naomi Girma's stats
Compare the top midfielders by VAEP
What teams create the most chances per game?
Who's overperforming their xG?
Plot minutes vs goals for forwards with 500+ minutes
About This Project#
NWSL Notebook is an independent project built to make soccer analytics accessible. The goal is simple: anyone should be able to explore the data, not just people with technical backgrounds.
Women's soccer deserves the same analytical attention as the men's game. This is a small step toward that.
Questions, feedback, or ideas? Join the conversation on Discord.