K-12 Data Literacy and Data Science Learning Progressions

How to Prioritize the Learning Progressions

Data on data science education priorities

Background

8 min read

Ready to Bring Data Science Skills Into Your Classroom? Start Here.

Whether you're a classroom teacher looking to prepare students for our data-driven world, or you're working on standards revision at the state level, we know you're asking the same question: Where do I even begin with data science education?

Teaching data science is like making pizza from scratch, but you don’t need to be an expert chef to get cooking. The Phase 1 K–12 Data Literacy and Data Science Model Learning Progressions (DSLPs) serve as your recipe book: begin with a simple cheese-pizza lesson, then add toppings with more complex projects until you’ve built a deep-dish analysis.

And just like pizza-making, this isn’t a completely linear process. You’ll mix, taste, adjust, and sometimes circle back before moving forward. The strands of the DSLPs mirror that same creative process: curating ingredients, preparing the dough, layering flavors, baking, and finally presenting the dish. Each step can be revisited or reshuffled, but together they move naturally from curation to visualization—helping your students cook up the critical data skills they’ll need for whatever comes next.

In our AI-powered world, students who can ask good questions about data, spot patterns, and think critically about information sources will have a huge advantage. These progressions give you a clear path to build those skills, no matter what subject you teach or what grade level you work with.

Designed to Fit Your Teaching Reality

We've intentionally created these learning progressions to be subject-neutral—meaning they can form the base for work in math, social studies, English, science, or any other subject. You won't need to overhaul your entire curriculum or become an expert in coding.

The progressions include 83 concepts organized into five strands, but here's the key: you're not responsible for teaching all 83 concepts tomorrow. Instead, these are designed to:

Give you choice and flexibility - Pick the concepts that make sense for your students' current level and your subject area. The research-based ranking included below can help you identify where it may be most impactful to prioritize your time.
Help you identify crucial gaps - See where your students might be missing key data literacy skills that you can easily integrate
Provide a clear progression - Know what comes in the next grade-span as students develop these abilities over time

Think of these progressions as sitting right between broad state standards (which can feel too vague) and specific curriculum materials (which might not fit your context). This "sweet spot" gives you enough detail to plan meaningful lessons, while still having the flexibility to adapt to your classroom's unique needs.

In our upcoming Phase 2 work, we'll be creating subject-specific versions that show exactly how these concepts can live within math, social studies, English, and other content areas. But right now, this subject-neutral version is perfect for getting started wherever you see the biggest opportunities in your teaching.

Built with Educators in Mind

These learning priorities emerged from a purposeful, year-long research process involving educators, students, employers, higher education faculty, researchers, and policy leaders across all 50 states. Over 800 stakeholders participated in systematic voting on essential data skills for graduation - akin to a “Portrait of a Graduate” for the Data & AI era. Structured focus groups with K-12 practitioners translated community priorities into developmentally appropriate learning targets. A comprehensive review process incorporated over 250 expert comments from the national K-12 data science education community to ensure age-appropriate learning trajectories.

The result? A research-validated and community-informed priority framework that focuses on the "high-power" concepts—the ideas that will give your students the biggest return on learning time.

Ranking Concepts by Priority

Once the grade-level learning goals were set, we then leveraged the community voting results as an “audit” on all 83 of the data science learning progression concepts that were developed. This ranking can help guide you as you think about the highest-value areas or the greatest number of critical learning outcomes by a student’s graduation.

At the end of the day, data science is a cycle-based investigation process - similar to the scientific method, but with even more iteration between steps. As such, we strongly recommend against viewing the concepts as a list of separable ideas in isolation. Just as you don't need to know every possible topping to make a delicious pizza, you don't need to master every data concept to get started. The prioritized list gives the essential ingredients—the crust, sauce, and cheese—so you can start cooking right away.

Whether you are on a committee informing state standards, a district coach planning next year’s offerings, or a classroom educator, we hope this will help.

While the full ranking provides comprehensive guidance, we've identified a few foundational concepts that represent particularly tangible stepping stones in student development. These concepts—data cleanliness, variable and structural complexity, and dataset size—form the backbone of data literacy progression and deserve special attention as you plan across grade levels. Rather than being buried in the larger ranking, we've pulled them out to show you exactly how students can grow in these core areas:

	K-2	3-5	6-8	9-12
B4.1 - Cleanliness	Clean datasets with no missing data or errors	Datasets requiring some cleaning (missing data, blank cells)	Datasets with missing values marked by codes (-99, blank cells)	Multiple cleaning issues: missing values, errors, anomalies, outliers
B4.2 - Complexity of Variables	Only numerical OR only categorical variables	Both numerical AND categorical variables	Rates and derived variables combining multiple measurements	Time-series data, expected value models, multiple variable types
B4.3 - Size	1-2 variables, 10-30 observations (class size)	Up to 4 variables, up to 50 observations	Up to 20 variables, 100+ observations	20+ variables, 1000+ observations (very large datasets)
B4.4 - Complexity of Structure	Pre-formatted datasets ready for analysis	Combining two simple datasets about same objects/events	Complex row structures (averages, nested cases, not single observations)	Multiple dataset merging, longitudinal data, multi-level aggregation

‍

And finally, here it is! The top 25 ranked concepts:

Tool Application (C4.1) - Use digital tools to summarize data and create visualizations. Apply these tools to identify patterns, clean and prepare data, perform analysis, and build models for simulations to explore relationships and trends.
Correlation versus Causation (D1.6) - Comfortably separate correlation from causation in a wide variety of situations, building a “first-reaction” thinking habit over time.
Biases in Data (A2.2) - Recognize all data contains bias but data collection and analysis methods can increase or mitigate the effects of biases.
Representational Fluency (E1.5) - Identify how layout (ordering, scale, and axes) choices increase clarity or potentially mislead an audience.
Probabilistic Language (D1.1) - When communicating with others, employ both plain-language and clear vocabulary to regularly describe degrees of uncertainty, both formally and informally as a thinking habit.
Application Fitness (D3.1) - Regularly identify generalization issues, with frequent comparisons between significant real-world examples and a current analysis.
Iteration, Validation, and Multiple Explanations (D2.2) - Regularly practice identifying alternative explanations for a result from data, both for interim steps and post-analysis conclusions.
Multivariable Decision-Making (D1.8) - Clearly describe how to leverage additional variables or additional outside data to make a logical argument, and identify potential risks of overdoing it.
Tool Accessibility for Diverse Learners (C4.6) - Understand how digital tools can support a broad range of diverse learners. Evaluate their effectiveness and impact, and explore inclusive data representations.
Data Use Risks and Benefits (A2.1) - Recognize that data can pose risks but also benefits for individuals and groups, and understand its potential uses, limitations, and risks, including unintended consequences.
Priors and Updates (D1.2) - When encountering new data, integrate probabilistic thinking into everyday situations by explicating prior assumptions and the impact of new data / evidence on those assumptions.
Explaining Significance (D1.4) - Clearly describe the basic logic of statistical significance to others, differentiating between significance, the size of an effect, and the statistical power of an analysis. Recognize what statistical significance can reveal and cannot reveal about a phenomenon.
Planning for Data Collection (B2.3) - Develop systematic plans that specify what data to collect, how to collect it, and from what sources to answer investigation questions.
Verifiable Questions and Statements (D2.1) - Identify and create the type of questions that can be answered by data, and are eventually verifiable using a combination of modeling and experimentation.
The Investigative Process (A3.1) - Recognize that making sense with data requires engaging with it in a particular way that includes combinations of the concepts and practices in the other four strands.
Meta-Analysis and Facts (D3.7) - Recognize the relationship between many trials, uncertainty, and whether a claim is a “fact.”
Uncertainty Statements & Limitations (D2.3) - Clearly explain the limitations and caveats of a conclusion from data, including the risks of extending the conclusion to another group or situation.
Intent & Authorship of Analyses (E3.1) - Regularly interrogate the point of view of a data author, and transparently share your own.
Analyzing Non-Traditional Data (C2.4) - Examine data beyond numbers, including sounds, textures, and text. Categorize sensory inputs, track word frequencies, and analyze data from sensors and IoT devices to identify patterns and trends.
Data Types and Forms (A1.1) - Recognize that data can exist as quantitative, ordinal, categorical, and other values. Data also can be “nontraditional” forms such as graphical or other media.
The Role of Code in Data Analysis (C4.5) - Explore how block coding and computer code automate and enhance data analysis. Understand how coding enables reproducible processes and compare its advantages and limitations to no-code and low-code tools.
Expected Value (D1.3) - When making a decision about uncertain outcomes in the future, integrate probabilistic thinking into everyday decisions by applying expected value (magnitude x probability) to appropriate situations.
Student Data Agency (A3.5) - Cultivate the motivation to engage with data in all areas of life and understand how data impacts your own experiences.
Advocacy with Data Arguments (E3.2) - Recognize how data can provide evidence for/persuade others toward positive change and how it can benefit society.
Civic Data Practices (E3.3) - Engage in civic practice and dispositions through recognition of the role data plays in civic society.

We believe in showing our work. In case you would like to explore the methodology we used to surface these priorities from top to bottom as well as the entire list of ranked concepts, you can view the full “audit” between the Top 25 Learning Outcomes for our Portrait of a Graduate vs. the DSLP Concepts here.