The Federal Government will Become America's Model Employer for the 21st Century.
Recruit, Retain and Honor a World-Class Workforce to Serve the American People.
Manage your retirement online.
Human Resources and Security Specialists should use this tool to determine the correct investigation level for any covered position within the U.S. Federal Government.
OPM’s Human Resources Solutions organization can help your agency answer this critically important question.
Developing senior leaders in the U.S. Government through Leadership for a Democratic Society, Custom Programs and Interagency Courses.
Visit this federal site to search for our regulatory notices, proposed and final rules.
See the latest tweets on our Twitter feed, like our Facebook pages, watch our YouTube videos, and page through our Flickr photos.
When designing an assessment strategy and when selecting and evaluating assessment tools it is important to consider a number of factors such as:
The term reliability refers to consistency. Assessment reliability is demonstrated by the consistency of scores obtained when the same applicants are reexamined with the same or equivalent form of an assessment (e.g., a test of keyboarding skills). No assessment procedure is perfectly consistent. If an applicant's keyboarding skills are measured on two separate occasions, the two scores (e.g., net words per minute) are likely to differ.
Reliability reflects the extent to which these individual score differences are due to "true" differences in the competency being assessed and the extent to which they are due to chance, or random, errors. Common sources of such error include variations in:
A goal of good assessment is to minimize random sources of error. As a general rule, the smaller the amount of error, the higher the reliability.
Reliability is expressed as a positive decimal number ranging from 0 to 1.00, where 0 means the scores consist entirely of error. A reliability of 1.00 would mean the scores are free of any random error. In practice, scores always contain some amount of error and their reliabilities are less than 1.00. For most assessment applications, reliabilities above .70 are likely to be regarded as acceptable.
The practical importance of consistency in assessment scores is they are used to make important decisions about people. As an example, assume two agencies use similar versions of a writing skills test to hire entry-level technical writers. Imagine the consequences if the test scores were so inconsistent (unreliable) applicants who applied at both agencies received low scores on one test but much higher scores on the other. The decision to hire an applicant might depend more on the reliability of the assessments than his or her actual writing skills.
Reliability is also important when deciding which assessment to use for a given purpose. The test manual or other documentation supporting the use of an assessment should report details of reliability and how it was computed. The potential user should review the reliability information available for each prospective assessment before deciding which to implement. Reliability is also a key factor in evaluating the validity of an assessment. An assessment that fails to produce consistent scores for the same individuals examined under near-identical conditions cannot be expected to make useful predictions of other measures (e.g., job performance). Reliability is critically important because it places a limit on validity.
Validity refers to the relationship between performance on an assessment and performance on the job. Validity is the most important issue to consider when deciding whether to use a particular assessment tool because an assessment that does not provide useful information about how an individual will perform on the job is of no value to the organization.
There are different types of validity evidence. Which type is most appropriate will depend on how the assessment method is used in making an employment decision. For example, if a work sample test is designed to mimic the actual tasks performed on the job, then a content validity approach may be needed to establish the content of the test matches in a convincing way the content of the job, as identified by a job analysis. If a personality test is intended to forecast the job success of applicant's for a customer service position, then evidence of predictive validity may be needed to show scores on the personality test are related to subsequent performance on the job.
The most commonly used measure of predictive validity is a correlation (or validity) coefficient. Correlation coefficients range in absolute value from 0 to 1.00. A correlation of 1.00 (or -1.00) indicates two measures (e.g., test scores and job performance ratings) are perfectly related. In such a case, you could perfectly predict the actual job performance of each applicant based on a single assessment score. A correlation of 0 indicates two measures are unrelated. In practice, validity coefficients for a single assessment rarely exceed .50. A validity coefficient of .30 or higher is generally considered useful for most circumstances (Biddle, 2005). 1
When multiple selection tools are used, you can consider the combined validity of the tools. To the extent the assessment tools measure different job-related factors (e.g., reasoning ability and honesty) each tool will provide unique information about the applicant's ability to perform the job. Used together, the tools can more accurately predict the applicant's job performance than either tool used alone. The amount of predictive validity one tool adds relative to another is often referred to as the incremental validity of the tool. The incremental validity of an assessment is important to know because even if an assessment has low validity by itself, it has the potential to add significantly to the prediction of job performance when joined with another measure.
Just as assessment tools differ with respect to reliability, they also differ with respect to validity. The following table provides the estimated validities of various assessment methods for predicting job performance (represented by the validity coefficient), as well as the incremental validity gained from combining each with a test of general cognitive ability. Cognitive ability tests are used as the baseline because they are among the least expensive measures to administer and the most valid for the greatest variety of jobs. The second column is the correlation of the combined tools with job performance, or how well they collectively relate to job performance. The last column shows the percent increase in validity from combining the tool with a measure of general cognitive ability. For example, cognitive ability tests have an estimated validity of .51 and work sample tests have an estimated validity of .54. When combined, the two methods have an estimated validity of .63, an increase of 24% above and beyond what a cognitive ability test used alone could provide.
Back to Top
Table adapted from Schmidt & Hunter (1998). Copyright © 1998 by the American Psychological Association. Adapted with permission. 2
* Referred to as the training & experience behavioral consistency method in Schmidt & Hunter (1998).
The technology available is another factor in determining the appropriate assessment tool. Agencies that receive a large volume of applicants for position announcements may benefit from using technology to narrow down the applicant pool, such as online screening of resumes or online biographical data (biodata) tests. Technology can also overcome distance challenges and enable agencies to reach and interview a larger population of applicants.
However, because technology removes the human element from the assessment process, it may be perceived as "cold" by applicants, and is probably best used in situations that do not rely heavily on human intervention, such as collecting applications or conducting applicant screening. Technology should not be used for final selection decisions, as these traditionally require a more individualized and in-depth evaluation of the candidate (Chapman and Webster, 2003). 3
Any assessment procedure used to make an employment decision (e.g., selection, promotion, pay increase) can be open to claims of adverse impact based on subgroup differences. Adverse impact is a legal concept used to determine whether there is a "substantially different" passing rate (or selection rate) between two groups on an assessment procedure (see www.uniformguidelines.com for a more detailed discussion). Groups are typically defined on the basis of race (e.g., Blacks compared to Whites), gender (i.e., males compared to females), or ethnicity (e.g., Hispanics compared to Non-Hispanics). Assessment procedures having an adverse impact on any group must be shown to be job-related (i.e., valid).
What is a "substantially different" passing rate? The Uniform Guidelines provide a variety of statistical approaches for evaluating adverse impact. The most widely used method is referred to as the 80% (or four-fifths) rule-of-thumb. The following is an example where the passing rate for females is 40% and the passing rate for males is 50%. The Uniform Guidelines lay out the following steps for computing adverse impact:
According to the 80% rule, adverse impact is not indicated as long as the ratio is 80% or higher. In this case, the ratio of the two passing rates is 80%, so evidence of adverse impact is not found and the passing rate of females is not considered substantially different from males.
Agencies are encouraged to consider assessment strategies to minimize adverse impact. When adverse impact is discovered, the assessment procedure must be shown to be job-related and valid for its intended purpose.
When applicants participate in an assessment process, they are not the only ones being evaluated; the agency is being evaluated as well. Applicants who complete an assessment process leave with impressions about the face validity and overall fairness of the assessment procedure. Their impressions can also be impacted by whether they believe they had a sufficient opportunity to display their job-related competencies. The quality of the interactions between the applicant and agency representatives can also affect applicant reactions. Agencies using grueling assessment procedures may end up alienating applicants. It is important to recognize applicants use the assessment process as one means to gather information about the agency. Failure to act on this fact can be very costly to agencies, particularly if top candidates are driven to look elsewhere for employment opportunities.
The design of an assessment strategy should begin with a review of the critical competencies identified from the job analysis results. Once you decide what to assess, you must then determine how to structure the personnel assessment process. In designing a selection process, a number of practical questions must be addressed, such as:
For example, if your budget is tight, you will need to rule out some of the more expensive methods such as assessment centers or work simulation tests. If you are expecting to receive thousands of applications (based on projections from similar postings), you will need to develop an effective screening mechanism ahead of time. If you need to fill a vacancy and only have a few weeks to do so, then a multi-stage process will probably not be feasible. In working out answers to these questions, it is usually helpful to think in terms of the entire selection process, from beginning to end.
One key consideration is the number of assessment tools to include in the process. Using a variety of assessments tends to improve the validity of the process and will provide information on different aspects of an applicant's likely job performance. Using a single measure will tend to identify applicants who have strengths in a specific area but may overlook applicants who have high potential in other areas. Assessing applicants using multiple methods will reduce errors because people may respond differently to different methods of assessment. For example, some applicants who excel at written tests may be too nervous to do well in interviews, while others who suffer from test anxiety may give impressive interviews. Another advantage of using a variety of assessment methods is a multiple hurdle approach can be taken. The least expensive assessments can be used first to pare down the applicant pool. More labor-intensive and time-consuming procedures can be introduced at a later stage when there are fewer candidates to evaluate.
Considering which assessment methods best measure which competencies at which stage in the process should help you develop a process well suited to your agency's hiring needs.
Agencies are encouraged to standardize and document the assessment process through the following steps:
(Information adapted from Gilliland, S.W., & Cherry, B., 2000). 4
For a more in-depth introduction to personnel assessment practices, including measurement techniques and related considerations (e.g., reliability, validity, job analysis, and legal requirements), refer to Essentials of Personnel Assessment and Selection by Guion and Highhouse (2006). 5
For a non-technical summary of the research literature on the value of commonly used assessment methods, see Selection Methods: A Guide to Implementing Formal Assessments to Build a High Quality Workforce (Pulakos, 2005). 6
More information about designing and implementing a selection process can be found in Competency-based Recruitment and Selection: A Practical Guide by Wood and Payne (1998).7
1 Biddle, D. (2005). Adverse Impact and Test Validation: A Practitioner's Guide to Valid and Defensible Employment Testing. Burlington, VT: Gower Publishing.
2 Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262-274.
3 Chapman, D. S., & Webster, J. (2003). The use of technologies in the recruiting, screening, and selection processes for job candidates. International Journal of Selection and Assessment, 11, 113-120.
4 Gilliland, S. W., & Cherry, B. (2000). Managing customers of selection. In J. K. Kehoe (Ed.), Managing Selection in Changing Organizations (pp. 158-196). San Francisco: Jossey-Bass.
5 Guion, R. M., & Highhouse, S. (2006). Essentials of Personnel Assessment and Selection. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
6 Pulakos, E. D. (2005). Selection Methods: A Guide to Implementing Formal Assessments to Build a High Quality Workforce. Alexandria, VA: SHRM Foundation.
7 Wood, R., & Payne, T. (1998). Competency-based Recruitment and Selection: A Practical Guide. Hoboken, NJ: Wiley.
The Assessment Decision Tool (ADT) is designed to help human resources professionals and hiring supervisors/managers develop assessment strategies for their specific hiring situation (e.g., volume of applicants, level of available resources).
The basic steps are:
That's all there is to it! Get started now.
The ADT is located on a different section of OPM's website and will open in a new window.
If you experience any problems while using the ADT or have questions about what to do, please submit a query at the technical support page. If you have questions or comments on the content of the ADT, please send an e-mail to Assessment_Information@opm.gov.