My Account Menu

Newsletter Signup

Please register to the site before you can sign for a list.
No account yet? Register
 

A CRITIQUE OF PSYCHOLOGICAL TESTS COMMONLY USED WITH CHRONIC PAIN PATIENTS

by William W. Deardorff, Ph.D, ABPP.


4 Credit Hours - $79
Last revised: 10/02/2017

Course content © Copyright 2011 - 2017 by William W. Deardorff, Ph.D, ABPP. All rights reserved.



PLEASE LOG IN TO VIEW OR TAKE THIS TEST

This test is only active if you are successfully logged in.


 

Course Outline

 

Introduction

Learning Objectives

The Basics of Statistics in Psychological Testing

Reliability

Test-Retest Reliability

Inter-rater Reliability

Parallel Forms Reliability

Internal Consistency Reliability

Validity

Content Validity

Face Validity

Criterion-related Validity

Concurrent validity

Predictive validity

Convergent validity

Discriminant validity

Construct validity

Generalizability

Standardization and Normative Samples

Overview of Psychological Testing of Chronic Pain Patients

Types of Instruments

Broadband-General

Broadband-Health

Narrow Focus

Narrow Focus – Health

Examples of Psychometric Tests Used with Chronic Pain Patients

Psychological Tests Commonly Used in the Assessment of Chronic Pain Patients

Attributes, Strengths and Weaknesses –

Published by the Colorado Division of Workers’ Compensation

 

Introduction

 

The various Practice Guides relating to assessment of chronic pain (including ACOEM and ODG) discuss the use of psychological testing. One of the most objective components of assessing a chronic pain patient is psychological testing. However, to achieve valid results, the clinician must have an understanding of the use of psychological tests with this special population.  As will be discussed in this course, there are many psychological tests that have been specifically designed to be used with chronic pain patients. However, the vast majority of tests used with chronic pain patients (e.g. the MMPI-2) were not originally designed for this purpose.  Being aware of the strengths and weaknesses of various psychological tests when used with chronic pain patients is extremely important to be able to assess the validity of the results. Having an understanding of these issues is important for all practitioners who evaluate and treat chronic pain patients.

 

Please note that for this course, the individual Help-Feature for each question on the test is available only for the material that is presented on the web site.  It is not available for the document to be reviewed which constitutes the second part of the course.  However, you can take the test as many times as you like, until you pass.  You will receive feedback each time you submit the test for scoring, until you pass. We hope you enjoy the course and reviewing this valuable information.

 

 

Learning Objectives

 

 

List the four types of reliability

Explain the 6 types of validity

Discuss two types of Broadband Health tests

Discuss the strengths/weaknesses of three tests

 

 

The Basics of Statistics in Psychological Testing

 

To evaluate a psychological test, either in general or for use with pain patients, one must have some familiarity with statistics related to psychometrics.  For psychologists and others that are trained to do testing, this will be review material.  However, anyone who works with chronic pain patients who have undergone psychological testing will be able to have an idea of the validity of the results presented in the report by having an understanding of these concepts. 

 

 

Table 1. Important Statistical Concept in Psychometrics 

 

 

Reliability

 

Test-Retest Reliability

Inter-rater Reliability

Parallel Forms Reliability

Internal Consistency Reliability

 

Validity

 

Content Validity

Face Validity

Criterion-related Validity

Concurrent validity

Predictive validity

Convergent validity

Discriminant validity

Construct validity

 

Generalizability

 

Standardization and Normative Samples

 

 

Reliability

 

Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly. For example, if a test is designed to measure a trait (such as introversion), then each time the test is administered to a subject, the results should be approximately the same. Unfortunately, it is impossible to calculate reliability exactly, but there several different ways to estimate reliability. Reliability is the extent to which a test is repeatable and yields consistent scores.

 

A perfectly reliable test is one that is completely accurate and free from error. In other words, the same test, given to the same individual, in the same way, should always yield the same value from moment to moment assuming the thing measured itself has not changed.  It also assumes that if there is a change in the test results, it is only due to a change in the thing being measured, and not the imperfection of the test.  This is the reliability that one strives for, but never achieves (this is true variance).  All psychological tests have some degree of measurement error (error variance).  This affects the value of the test that is not related to the thing being measured.  It is the imperfection of the test.  Every test tries to maximize true variance and minimize error variance.

 

Test-Retest Reliability

 

The test-retest method of estimating a test's reliability involves administering the test to the same group of people at least twice.  Then the first set of scores is correlated with the second set of scores.  Test-retest correlations range between 0 (low reliability) and 1 (high reliability). This kind of reliability is used to assess the consistency of a test across time. This type of reliability assumes that there will be no change in the quality or construct being measured. Test-retest reliability is best used for things that are stable over time, such as intelligence. Generally, reliability will be higher when little time has passed between tests and lower if a greater amount of time has passed.  Reliability is negatively impacted by measurement error.  One desires a test to have a low measurement error.  Change due to measurement error is not related to actual changes in the variable being measured (e.g. if you use a tape measure to measure a room on two different days, any differences in the result is likely due to measurement error rather than a change in the room size).  The test-retest reliability of tests used to assess variables expected to change over time (e.g. level of depression) are done with short test-retest intervals and using other methods.

 

Inter-rater Reliability

 

This type of reliability is assessed by having two or more independent judges score the test. The scores are then compared to determine the consistency of the raters’ estimates. One way to test inter-rater reliability is to have each rater assign each test item a score. For example, each rater might score items on a scale from 1 to 10. Next, you would calculate the correlation between the two ratings to determine the level of inter-rater reliability. Another means of testing inter-rater reliability is to have raters determine which category each observation falls into and then calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate.  Inter-rater reliability is certainly important for such measures as the GAF.  For instance, if two practitioners independently assessed a patient and assigned a GAF, how closely would those results correlate.  If the practitioners are well trained in the use of the GAF, the inter-rater reliability should be high.

 

Parallel-Forms Reliability

 

Parallel-forms (or alternate forms) reliability is gauged by comparing two different tests that were created using the same content. This is accomplished by creating a large pool of test items that measure the same quality and then randomly dividing the items into two separate tests. The two tests should then be administered to the same subjects at the same time.

 

Internal Consistency Reliability

 

This form of reliability is used to judge the consistency of results across items on the same test. Essentially, you are comparing test items that measure the same construct to determine the test’s internal consistency. When you see a question that seems very similar to another test question, it may indicate that the two questions are being used to gauge reliability. Because the two questions are similar and designed to measure the same thing, the test taker should answer both questions the same, which would indicate that the test has internal consistency.

 

Interpreting Reliability Correlations

 

So what do reliability figures indicate?  Test manuals and independent reviews of tests will provide information about reliability.  The reliability of a test is indicated by a reliability coefficient.  It is denoted by the letter “r” and is expressed by a number ranging from 0 (no reliability or correlation) to 1 (perfect reliability or correlation).  Generally reliability coefficients are expressed as a decimal (e.g. r = .75) and the larger the reliability coefficient, the more repeatable or reliable the test score. However this does not indicate the test’s validity, which will be discussed later.  For a test to be valid it MUST have reasonable reliability but a highly reliable test is not necessarily valid.  Some general guidelines for interpreting test reliability can be found in Table 2. 

 

 

Table 2. General Guidelines for Interpreting Reliability Coefficients

 

 

.90 and higher: Good to Excellent

 

High reliability is required when tests are used to make important decisions and individuals are sorted into many different categories based upon relatively small individual differences such as intelligence (Most standardized tests of intelligence report reliability estimates around .90).

 

.80 - .89:  Good

 

Reliability estimates of .80 or higher are typically regarded as moderate to high (approx. 16% of the variability in test scores is attributable to error)

 

.70 - .79:  Adequate to Low depending on purpose

 

Lower reliability is acceptable when tests are used for preliminary decisions and to sort people into smaller groups based upon larger individual differences.  For most testing applications, reliability estimates around .70 are usually regarded as low.

 

.60 and below:  Low

 

Reliability estimates below .60 are usually regarded as unacceptably low.

 

 

Validity

 

Validity is the extent to which a test measures what it claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted.  In order to be valid, a test must be reliable; but reliability does not guarantee validity. Validity isn’t determined by a single statistic, but by a body of research that demonstrates the relationship between the test and the behavior it is intended to measure. There are three types of validity:

 

Content Validity

 

When a test has content validity, the items on the test represent the entire range of possible items the test should cover (or, the adequacy with which the test items adequately and representatively sample the content area to be measured). Individual test questions may be drawn from a large pool of items that cover a broad range of topics. In some instances where a test measures a trait that is difficult to define, an expert judge may rate each item’s relevance. Because each judge is basing their rating on opinion, two independent judges rate the test separately. Items that are rated as strongly relevant by both judges will be included in the final test.  Content validity is primarily an issue for educational tests, certain industrial tests, and other tests of content knowledge like the Psychology Licensing Exam. Expert judgment (not statistics) is the primary method used to determine whether a test has content validity. Nevertheless, the test should have a high correlation w/other tests that purport to sample the same content domain.

 

Face Validity

 

Face validity is the least important aspect of validity, because validity still needs to be directly checked through other methods. All that face validity means is: "Does the measure, on the face of it, seem to measure what is intended?"  Some measures commonly used have very high face validity such as the Beck Depression Inventory.  Anyone looking at the items on the test can tell exactly what is being measured (symptoms of depression).  It may have excellent validity for measuring depression in a person who is committed to answering the questions in a truthful manner.  However, the test is easily manipulated and a person who is not depressed can easily produce results on the BDI indicating high levels of depression (it is easily faked).  For that reason, researchers will sometimes purposely try to obscure a measure’s face validity in an effort to attain improved validity elsewhere (the MMPI-2 test is probably the best example of low face validity which was purposely done as part of its construction).    However, a test does not always strive for high content validity.  The best example is the MMPI-2.  The items on the MMPI-2 have been found to be associated with certain traits being measured, but one can often not tell what the question is measuring.  In these cases, the other types of validity are more important (criterion and construct). Face validity is not a technical sense of test validity.  Just because a test has face validity does not mean it will be valid in the technical sense of the word.

 

Criterion-related Validity

 

A test is said to have criterion-related validity when the test has demonstrated its effectiveness in predicting criterion or indicators of a construct. There are two different types of criterion validity:

 

Concurrent Validity.  Concurrent validity occurs when the criterion measures are obtained at the same time as the test scores. This indicates the extent to which the test scores accurately estimate an individual’s current state with regards to the criterion. For example, on a test that measures levels of depression, the test would be said to have concurrent validity if it measured the actual current levels of depression experienced by the test taker.  Often, criterion validity is established by correlating the test findings (e.g. level of depression) with some “gold-standard” (e.g. a structured interview and assesses depression).


Predictive Validity. Predictive validity occurs when the criterion measures are obtained at a time after the test. Examples of tests with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations.  In the medical realm, related to psychological testing, predictive validity might be something like the MMPI’s ability to predict a patient’s response to spine surgery.

 

Convergent Validity

 

It is important to know whether the test being used (or developed) returns similar results to other tests which purport to measure the same or related constructs.  Questions to be addressed include: Does the measure match with an external 'criterion' such as a behavior or another, well-established, test? Does it measure it concurrently and can it predict this behavior? Examples might include a self-report measure of pain level compared to trained observers ratings a patient’s pain behavior.  

 

Discriminant Validity

 

Just as it is important to show that a test returns results similar to other test of the same trait, it is also important to show that a measure doesn't measure what it isn't meant to measure (i.e. it discriminates). For example, discriminant validity would be evidenced by a low correlation between a depression measure and one of self-efficacy or self-esteem (one would expect depression and self-esteem to be inversely related).  Also, a test of depression that correlates highly with an anxiety test will not have good discriminant validity (it cannot discriminate between depression and anxiety). 

 

Construct Validity

 

Construct Validity is the most important kind of validity.  A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. If a measure has construct validity it measures what it purports to measure. Establishing construct validity is a long and complex process. The various qualities that contribute to construct validity include: criterion validity (includes predictive and concurrent), convergent validity, and discriminant validity.

 

Generalizability

 

Reliability and validity are often discussed separately but sometimes you will see them both referred to as aspects of generalizability. Often we want to know whether the results of a measure or a test used with a particular group can be generalized to other tests or other groups.  This is especially important relative to the topic being discussed.  For instance, are the results of the MMPI (that was developed using non-pain patients) generalizable when given to a patient with chronic pain?  A test may be reliable and it may be valid but its results may not be generalizable to other tests measuring the same construct nor to populations other than the one sampled.

 

Standardization and the Normative Sample

 

To understand norms and statistical assessment one first needs to understand standardization. Standardization is the process of testing a group of people to see the scores that are typically attained on the test. With a standardized test (such as the MMPI), the patient’s raw data results are compared to where that score falls compared to the standardization group's performance. This results in the standardized scores.  With standardization, the normative group upon which the test was developed must reflect the population on which the test is being used. Most commonly used major psychological measures, are norm-based (again, meaning that the score for an individual is interpreted by comparing his/her score with the scores of a group of people who define the norms for the test).  Often the test manual, or subsequent publications, will provide data about different results with different “normative” or standardization groups.  For instance, there may be community norms, medical patient norms, psychiatric patient norms, etc.  Depending upon which set of normative data is used, will change the standardized score for the individual patient. Often, if multiple norms are available, they will be reported as part of a computerized scoring report (e.g. the largest of which is NCS-Pearson).  The concept of normative and standardization is very important relative to test interpretation but often ignored when common psychological tests are used with chronic pain patients (not included in the normative sample).

 

In summary, standardized tests are:

 

Administered under uniform conditions (no matter where, when, by whom or to whom it is given, the test is administered in a similar way).

 

Scored objectively (the procedures for scoring the test are specified in detail so that any number of trained scorers will arrive at the same score for the same set of responses). Questions that need subjective evaluation (e.g. essay questions, responses to open-ended questions) are generally not considered standardized tests.

 

Designed to measure relative results on the test as compared with the normative sample (as discussed above).  In order to measure relative results, standardized tests are interpreted with reference to a comparable group of people (the standardization or normative sample).  One example is a test of depression.  The cut-off scores for a depression test (e.g. not depressed, mildly depressed, severely depressed) are determined during the test development phase using the standardization group (or normative sample). The normative sample should be representative of the target population - however this is not always the case.  In that case, the test needs to be interpreted with appropriate caution.  This is one of the weaknesses of most tests that were not originally designed for use with pain patients but are commonly used with this population (e.g. MMPI, BDI, etc.).  This “off-label” use of the test can still be done effectively if this is taken into account in the interpretation of results. In most cases, there is substantial research on the use of these tests with pain patients and, in some cases, special normative data is available (e.g. MMPI standardized on a chronic pain population).

 

Overview of Psychological Testing of

Chronic Pain Patients

 

Psychometric instruments used in the assessment of chronic pain might be categorized as four general types as can be seen in Table 3.   

 

 

Table 3. Four Types of Psychometric Instruments for use in Chronic Pain

 

 

Broadband - General

Broadband - Health/Pain

Narrow Focus

Narrow Focus – Health/Pain

 

 

Broadband-General

 

The broadband-general measures include those that were not originally designed to assess medical patients including pain. These measures often assess a number of personality, behavioral or other variables.  These assessments were not originally designed to assess medical issues, but often normative data for specific populations has been developed to help with generalizability.  The Minnesota Multiphasic Personality Inventory-2 (MMPI-2; Butcher, Graham, Ben-Porath et al., 2001) is the most widely used and researched personality inventory.  The MMPI-2 was designed to identify psychopathology and personality features; however, it is also one of the most commonly used measures in such things as chronic pain, pre-surgical screening, and other issues.  When using broadband-general measures, the clinician must be well-versed in validity, standardization, and interpretation issues to avoid misuse of the test.  Example excerpts of three different interpretations of a 1-3/3-1 codetype on the MMPI-2 illustrate this point and can be seen in Table 4:

 

 

Table 4. Differing MMPI-2 Interpretations: Traditional, Chronic Pain, Surgical Screening

 

 

Traditional (often computer generated as part of an interpretative report).  Classic conversion symptoms may be present, particularly if scale 2 is considerably lower than scales 1 and 3 (the so called conversion-V pattern).  Whereas some tension may be reported, severe anxiety and depression usually are absent.  The somatic complaints include headaches, chest pain, back pain, and numbness or tremors in the extremities.  Other physical complaints include weakness, fatigue, dizziness, and sleep disturbance.  The physical symptoms increase in times of stress, and there is clear secondary gain associated with symptoms.  These individuals present themselves as normal, responsible, and without fault.  They make excessive use of denial, projection, and rationalization, and they blame others for their difficulties.  They tend to be rather immature, egocentric, and selfish.  They are insecure and have a strong need for attention, affection, and sympathy.   They are very dependent, but they are uncomfortable with the dependency and experience conflict because of it. Although they are outgoing and socially extroverted, their social relationships tend to be shallow and superficial, and they lack genuine involvement with other people.

 

Chronic pain. Patients with similar profiles present with a wide variety of vague and diffuse somatic complaints.  In these cases, there is often a very low correlation between subjective and objective findings.  These patients show pain behaviors and somatic complaints far beyond what would be expected due to nociceptive input and objective findings.  These patients show a high readiness to admit pain behaviors, but very little emotional distress associated with their reports of pain and other symptoms (low Scale 2).  From a positive reinforcement perspective, given the “readiness to emit pain behaviors” there is an increased chance of the patient “using” the pain behaviors to influence his or her environment or of pursuing reinforcing social consequences.  From a negative reinforcement perspective, these patients will often use complaints of pain to extricate themselves from stressful situations.  Extreme elevations on Scales 1 and 3, in conjunction with a non-clinical elevation on Scale 2 (depression) suggest that this patient is not uncomfortable in the sick role and may find aspects of it reinforcing.  As such, the patient is showing a high readiness to admit pain behaviors, along with multiple somatic complaints, in conjunction with minimal distress regarding these symptoms.  (Note.  If scale 2 is elevated at or above  1-3, then the patient is expressing distress about being in the sick role or being unhappy and uncomfortable with his/her pain behaviors and, hence, not as likely to find them reinforcing.  This profile suggests less of an influence of environmental contingencies.)

 

Spine Pre-Surgical Screening.  As discussed by Block, Gatchel, Deardorff, and Guyer (2003, page 83), elevations on Scales 1 and 3 reflect excessive sensitivity to pain rather than the cause of the pain.  In other words, in the face of a certain level of nociception, individuals who have high scores on Scales 1 and 3 are more likely to experience high pain levels, and to be more functionally disabled than those with low scores on these scales.  As such, pain sensitivity as assessed by Scales 1 and 3 seems to predispose patients towards negative spine surgery results even when the surgery corrects the underlying pathology (Block et al., 2003, page 84).  Individuals with this profile tend to respond very poorly to interventional and invasive pain management techniques aimed at identifying and “fixing” a physical pain generator.  The reason they do so poorly is that the other non-physical factors continue to impact their perception of pain and suffering

 

 

Broadband-Health

 

The broadband-health measures are measures that have been specifically developed to assess a number of issues related to health and medical issues, without necessarily focusing on one particular health problem.  Examples can be seen in Table 3.  These tests will often assess psychological and behavioral issues that are intimately related to medical treatment.  For instance, the Battery for Health Improvement-2 (BHI-2; Bruns and Disorbio, 2003) is designed “for the psychological assessment of medical patient” and includes scales organized into five domains: validity, physical symptoms, affective, character, and psychosocial variables.  Similarly, the Millon Behavioral Medicine Diagnostic (MBMD; Millon, Antoni, Millon, Minor and Grossman, 2001) includes domains of response patterns, psychiatric indications, coping styles, stress moderators, treatment prognostics, management guides and negative health habits.  The MBMD now has normative data for general medical patients, chronic pain and bariatric surgery candidates. 

 

Narrow Focus

 

The narrow focus measures include measures that assess a particular psychological issue such as depression, anxiety, suicidality, stress and coping.  Probably two of the most commonly used measures in this category are the Beck Depression Inventory (BDI-2; Beck, Steer, & Brown, 1996) and the Beck Anxiety Inventory (BAI; Beck and Steer, 1993).  Similar to the MMPI-2, when these measure are used with medical patients one must be very cautious with interpretation.  For instance, the BDI-2 is a measure of self-rated depression that contains a number of physical (e.g. weight, sleep, energy) and cognitive (concentration, memory) symptoms, all of which can be differentially affected by depression, pain or some other medical condition, or both. Therefore, the clinician should always be aware of the impact of the actual medical problem on the narrow focus psychological instrument.

 

Narrow Focus-Health

 

The narrow focus-health test is designed to be a brief measure of a specific medical or health condition (See Table 5).  These tests are valuable for assessing and treating a specific condition.  Examples of these tests have been developed for the assessment of chronic pain (often used in conjunction with some of the broad based measures).  For instance, the Multidimensional Pain Inventory (MPI; Kerns, Turk, & Rudy, 1985) includes 13 scales that yield assignment to one of three profiles based on cluster analysis: Dysfunctional, Interpersonally Distressed, and Adaptive Coper.  Some examples of psychometric tests of each type that are used with chronic pain patients can be found in Table 5.

 

 

Table 5. Examples of Psychometric Tests used with Chronic Pain Patients

 

 

Broadband-General

 

Minnesota Multiphasic Personality Inventory – 2 (MMPI-2; Butcher et al., 2001)

 

Personality Assessment Inventory (PAI; Morey, 1991)

 

Symptom Checklist - 90R (SCL-90R; Derogotis, 1983)

 

Millon Clinical Multiaxial Inventory (MCMI-III; Millon, Davis, & Millon, 1997)

 

Broadband-Health

 

Millon Behavioral Medicine Diagnostic (MBMD; Millon, Antoni, et al., 2001)

 

Battery for Health Improvement – 2 (BHI-2; Bruns & Disorbio, 2003)

 

Sickness Impact Profile (SIP; Bergner, Bobbitt, Carter, & Gilson, 1981)

 

Health Locus of Control (HLC; Wallston, Wallston, & Devellis, 1978)

 

Wahler Physical Symptoms Checklist (Wahler, 1983).

 

Narrow Focus-General

 

Beck Depression Inventory – 2 (BDI-2; Beck, Steer, & Brown, 1996)

 

Beck Anxiety Inventory (BAI; Beck, Epstein, Brown, & Steer, 1988)

 

Post-Traumatic Stress Diagnostic Scale (PDS; Foa, 1995)

 

Narrow Focus-Health

 

Multidimensional Pain Inventory (MPI; Kerns, Turk, & Rudy, 1985)

 

Survey of Pain Attitudes (SOPA; Jensen and Karoly)

 

 

One of the primary pitfalls in psychometric assessment of the chronic pain patient is not paying attention to the validity of the test instrument relative to the problem being assessed along with concomitant interpretation issues.  It is always important to keep in mind standardization and basic psychometric issues when using any test on a medical patient population including chronic pain.

 

Psychological Tests Commonly Used in the

Assessment of Chronic Pain Patients

 

The following document entitled, “Psychological Tests Commonly Used in the Assessment of Chronic Pain Patients” was published by the Colorado Division of Workers’ Compensation.  It is an excellent review of these tests including:

 

Test Characteristics

Attributes of the Tests

Strengths and Weaknesses of Each Test

 

The review includes categories that are listed in Table 6.  These roughly correspond to the types of tests as discussed in Table 3 with slightly more specificity. 

 

 

Table 6. Categories of Tests Reviewed

 

 

Comprehensive Inventories for Medical Patients (Broadband – Health)

Comprehensive Psychological Inventories (Broadband – General)

Brief Multidimensional Screens for Medical Patients (Broadband – Health)

Brief Multidimensional Screens for Psychiatric Patients (Narrow Focus)

Brief Specialized Psychiatric Screening Measures (Narrow Focus)

Brief Specialized Medical Screening Measures (Narrow Focus – Health/Pain)

 

 

 

The Colorado review is referenced in many other Practice guidelines.  The remainder of the course includes the material contained in the document.  The document is available in pdf format and can either be reviewed from online, or printed.  The questions on the post-course test refer to the previous material and the following document.

 

To review the document click here

 

 

 

REFERENCES

 

Beck, A.T. & Steer, R.A. (1993). BAI, Beck Anxiety Inventory Manual.  San Antonio, TX: Psychological Corporation.

 

Beck, A.T., Steer, R.A., & Brown, G.K. (1996). Manual for the Beck Depression Inventory-II. San Antonio, TX: Psychological Corporation.

 

Bergner, M., Bobbitt, R.A., Carter, W.B., & Gilson, B.S. (1981). The Sickness Impact Profile: Development and final revision of a health status measure.  Medical Care, 19, 787-806.

 

Bruns, D. & Disorbio, J.M. (2003). Battery for Health Improvement – 2. Minneapolis, MN: Pearson.

 

Block, A.R., Gatchel, R.J., Deardorff,W.W., & Guyer, R.D. (2003). The Psychology of Spine Surgery.  Washington, D.C: American Psychological Association.

 

Butcher, J.N., Graham, J.R., Ben-Porath, Y.S. et al. (2001). MMPI-2: Manual for administration, scoring, and interpretation. Minnesota, MN: University of Minnesota Press.

 

Derogatis, L.R. (1983). SCL-90-R: Administration, scoring and procedures manual-II. Towson, MD: Clinical Psychometric Research.

 

Foa, E.B. (1995). Posttraumatic Stress Diagnostic Scale Manual.  National Computer Systems Inc.

 

Kerns, R.D., Turk, D.C., & Rudy, T.E. (1985).  The West Haven-Yale Multidimensional Pain Inventory (WHYMPI). Pain, 23, 345-356.

 

Millon, T., Davis, R.D., & Millon, C. (1997). MCMI-III manual (2nd ed.). Minneapolis, MN: National Computer Systems.

 

Millon, T., Antoni, M., Millon, C., Minor, S., & Grossman, S.  (2001). Millon Behavioral Medicine Diagnostic.  Bloomington, MN: Pearson Assessments.

 

Morey, L.C. (1991). Personality Assessment Inventory: Professional Manual. Tampa, FL: Psychological Assessment Resources.

 

Wahler, H.J. (1983).  Wahler Physical Symptoms Inventory Manual. Los Angeles, CA: Western Psychological Services. 

 

Wallston, K.A., Wallston, B.S., Devellis, R. (1978). Development of Multidimensional Health Locus of Control (MHLC) Scales. Health Education Monographs, 6, 160–70.

 

 

 

 

 



PLEASE LOG IN TO VIEW OR TAKE THIS TEST

This test is only active if you are successfully logged in.




Additional information