Standards of evidence help us make consistent and transparent judgements when assessing evidence about the effectiveness of a particular education policy, practice or program.
Why we need standards of evidence
Teachers, educators, leaders and policymakers make dozens of decisions every day aimed at improving outcomes for children and students.
How can they be confident that what they choose to do will work? How do they decide between one approach versus another?
To answer these questions in a consistent and transparent way, we need a way to evaluate the strength of research evidence on the effectiveness of a particular approach. We need standards of evidence.
AERO’s Standards of evidence
AERO’s standards of evidence establish our view on what constitutes rigorous and relevant evidence. When evidence is rigorous and relevant, it provides confidence that a particular approach is effective in a particular context.
The standards can apply to all forms of education evidence – whether generated through academic research or by teachers and educators through their daily practice.
AERO will primarily use the standards of evidence when undertaking those projects which are based on syntheses and causal (or evaluative) research.
In developing the standards of evidence, AERO has sought to build upon existing policies and research on evidence use across Australia and the world. AERO hopes that the Standards of evidence and associated evidence tools can help enhance quality conversations about generating and using evidence across the Australian education community.
Rigour and relevance
There are many criteria that can be used to evaluate education evidence. AERO’s Standards of evidence prioritise 2 criteria: rigour and relevance. These criteria have been prioritised because they are the most important considerations when deciding whether a piece of evidence can give someone confidence that a particular educational approach will be effective in their context.
Rigorous evidence is defined as evidence produced using research methods (whether qualitative, quantitative or mixed methods) that isolate the specific impact of a particular educational approach.
Relevant evidence is defined as evidence produced in contexts that are similar to one’s own. Evidence is also relevant when it is derived from a large number of studies conducted over a wide range of contexts, as this suggests that the educational approach is not dependent on any particular contextual factor.
Although the Standards of evidence clearly differentiate between 4 levels of confidence that evidence can provide, the standards should be viewed as a continuum along which rigour and relevance gradually increase. Evidence at each level builds on the evidence from preceding levels.
How AERO uses the Standards of evidence
When conducting research syntheses, we use the standards to make consistent and transparent judgements in selecting the best available research evidence. When conducting causal (or evaluative) research, we use the standards to guide the design and implementation of the research, so that it generates high-quality evidence for the Australian education community.
There may be occasions when research evidence does not clearly fit within a particular level of confidence. When this is the case, we draw on expert research guidance to make an assessment about how confident we are in the effectiveness of an approach.
How educators, teachers, leaders, policymakers and researchers can use the standards
The standards of evidence can be used to determine the strength of existing evidence for a particular approach, in your particular context.
To help you use the standards in your context, we've developed Evidence decision-making tools. These tools help educators, teachers and policymakers evaluate their confidence in the effectiveness of a new or existing approach and provide implementation guidance appropriate to your level of confidence.
The standards can also be used by policymakers and researchers when designing evaluations. By designing evaluations aligned to the Standards of evidence, policymakers and researchers can try to generate evidence that meets a desired level of confidence.

Level 1 Low confidence
What types of research fit within this level?
Research that presents a hypothesis for why the approach should have positive effects on outcomes.
This research does not provide data (whether qualitative or quantitative) to substantiate its claims that the approach is effective.
What features of research studies increase my confidence within this level?
The study provides an explanation that is based on well-established theories of learning and development.
The study clearly explains step-by-step how the approach is hypothesised to have positive effects.
Level 2 Medium confidence
What types of research fit within this level?
Research that demonstrates a correlation between the approach and positive effects on outcomes; for example:
- small-scale studies, such as case studies, and/or large-scale studies, such as cross-national surveys
- studies using qualitative (for example, observations and/or interviews), quantitative (for example, statistical techniques) or mixed methods.
This research does not necessarily show that the approach causes positive effects as there could be other potential explanations.
What features of research studies increase my confidence within this level?
The study has been conducted in my own context or in contexts similar to my own.
The study corroborates findings from other studies conducted in many different contexts.
The study measures change in outcomes over time.
The study has a large sample size that is spread across more than one site.
The study uses strategies that discount the possibility that effects are due to chance.
The study compares one group that has been subject to the approach to another group that has not been subject to the approach.
The study is conducted by people or organisations independent of the developer of the approach.
The study has been conducted recently.
Level 3 High confidence
What types of research fit within this level?
Research that meets the following criteria:
- uses rigorous qualitative, quantitative or mixed methods that address issues like selection bias, history effects and maturation effects
- uses outcome measures validated for the purposes of the study.
This research does not necessarily prove the approach causes positive effects in my context. This is because there may be other factors in my context that mean the approach will not work as intended.
What features of research studies increase my confidence within this level?
The study corroborates findings from other studies conducted in many different contexts.
The study measures change in outcomes over time.
The study has a large sample size that is spread across more than one site.
The study uses strategies that discount the possibility that effects are due to chance.
The study compares one group that has been subject to the approach to another group that has not been subject to the approach.
The study has been conducted by people or organisations independent of the developer of the approach.
The study has been conducted recently.
The study mitigates the likelihood that effects are simply due to the particular characteristics of those that participate in the study.
The study discusses and/or tests the key contextual factors that may influence the effectiveness of the approach.
Level 4 Very high confidence
What types of research fit within this level?
Research that meets the following criteria:
- uses rigorous qualitative, quantitative or mixed methods that address concerns like selection bias, history effects and maturation effects
- uses outcome measures validated for the purposes of the study
- is conducted in my context or in contexts similar to mine.
- synthesises the findings of rigorous research through a systematic review or meta-analysis of studies conducted in a range of contexts or in contexts similar to mine
What features of research studies increase my confidence within this level?
The study corroborates findings from other studies conducted in many different contexts.
The study identifies the factors that lead to the approach working, and the conditions that are necessary for the approach to be implemented on a larger scale.
The study assesses the effectiveness of the approach on different subgroups and explains reasons for any differences in effectiveness between subgroups.
The study monitors outcomes for different groups over time to ensure continued effectiveness.
effective/ness - An educational approach is effective if it causes (see causation above) a desired change in a particular outcome. This desired change can be an increase in an outcome (for example, increases in student achievement) or it can be a decrease in an outcome (for example, reduction in student absenteeism).
research (or types of research) - Research is ‘the creation of new knowledge and/or the use of existing knowledge in a new and creative way so as to generate new concepts, methodologies, inventions and understandings’ (Australian Research Council, 2015). There are many types of research. For example:
- exploratory research involves investigating an issue or problem. It aims to better understand this problem and sometimes leads to the formation of hypotheses or theories about the problem.
- descriptive research describes a population, situation or event that is being studied. It focuses on developing knowledge about what exists and what is happening.
- causal research (also known as ‘evaluative research’) uses experimentation to determine whether a cause-and-effect relationship exists between two or more elements, features or factors.
- synthesis research combines, compares and links existing information to provide a summary and/or new insights or information about a given topic.
research (or types of research) - Research is ‘the creation of new knowledge and/or the use of existing knowledge in a new and creative way so as to generate new concepts, methodologies, inventions and understandings’ (Australian Research Council, 2015). There are many types of research. For example:
- exploratory research involves investigating an issue or problem. It aims to better understand this problem and sometimes leads to the formation of hypotheses or theories about the problem.
- descriptive research describes a population, situation or event that is being studied. It focuses on developing knowledge about what exists and what is happening.
- causal research (also known as ‘evaluative research’) uses experimentation to determine whether a cause-and-effect relationship exists between two or more elements, features or factors.
- synthesis research combines, compares and links existing information to provide a summary and/or new insights or information about a given topic.
approach - An approach is the term AERO uses to refer to a practice, program or policy.
evidence (or education evidence) - Evidence is any type of information that supports an assertion, hypothesis or claim. There are many types of evidence in education, including insights drawn from child or student assessments, classroom observations, recommendations from popular education books and findings from research studies and syntheses. AERO refers to two types of evidence in its work:
- research evidence: This is academic research, such as causal research or synthesis research, which uses rigorous methods to provide insights into educational practice.
- practitioner-generated evidence: This is evidence generated through practitioners in their daily practice (for example, teacher observations, information gained from formative assessments or insights from student feedback on teacher practice).
rigour (or rigorous research or rigorous evidence) - Evidence is considered rigorous when it proves that a particular approach causes a particular outcome. Rigorous evidence is produced by using specialised research methods that can identify the impact of one particular influence. The most common research method used to produce rigorous evidence is the randomised controlled trial. However, there are many other methods that can produce rigorous evidence, whether qualitative, quantitative or mixed methods. What is important in producing rigorous evidence is that the research method can rule out the effects of as many other influences as possible.
relevant evidence - Relevant evidence is evidence produced in contexts that are similar to one’s own context. Evidence can also be considered relevant when it is derived from a large number of studies conducted over a wide range of contexts.
research methods - Research methods are the methods used to conduct research. Research methods are generally classified as ‘qualitative’ or ‘quantitative’. When both methods are used, it is referred to as ‘mixed methods’ research. Qualitative methods involve collecting and analysing non-numerical data (such as observations, interviews, questionnaires, focus groups, documents and artifacts). Qualitative methods can be used to understand concepts, opinions or experiences as well as to gather in-depth insights into a problem or generate new ideas. Quantitative methods involve collecting and analysing numerical data. Quantitative methods are generally used to find patterns and averages, make predictions, test causal relationships and generalise results to wider populations.
quantitative methods - Quantitative methods involve collecting and analysing numerical data. Quantitative methods are generally used to find patterns and averages, make predictions, test causal relationships and generalise results to wider populations.
qualitative methods - Qualitative methods involve collecting and analysing non-numerical data, and may include observations, interviews, questionnaires, focus groups, and documents and artifact analysis. Qualitative methods can be used to understand concepts, opinions or experiences as well as to gather in-depth insights into a problem or generate new ideas.
mixed-methods research - Mixed-methods research is research that uses both qualitative (non-numerical data) and quantitative (numerical data) research methods.
context (or contextual factors) - Context is the social, cultural and environmental factors found in research settings. Taking context into account in research studies is important because context can affect the outcomes of research (i.e. evidence generated in one context may not necessarily apply to a different context). Evidence is most relevant when it has been generated in a context similar to the context in which it will be applied. Examples of ‘context’ may include location, demographics of research participants, or the level of organisational support for the particular approach being researched.
evaluation - Evaluation is the systematic and objective assessment of an approach. Evaluation provides evidence of what has been done well, what could be done better, the extent to which objectives have been achieved and/or the impact of the approach. This evidence can then be used to inform ongoing decision-making regarding the approach.
AERO has defined some common education research terms. Read our list of key concepts explained.