QME Introduction to the GAF and Rating Psychiatric Impairmentby William W. Deardorff, Ph.D, ABPP.
Course content © Copyright 2014 - 2024 by William W. Deardorff, Ph.D, ABPP. All rights reserved. |
PLEASE LOG IN TO VIEW OR TAKE THIS TEST
This test is only active if you are successfully logged in.
IMPORTANT NOTE: In My Account, please be sure to add QME to your Degree List and your QME Provider Number to your licenses. All degrees and license numbers will be printed on the CE certificate. The certificates cannot be changed once the course is purchased.
COURSE OUTLINE
Introduction and Course Overview Learning Objectives DSM-IV-TR Multiaxial System Axis V: The Global Assessment of Functioning Summary of the GAF DSM-IV-TR Definitions General Categories of GAF Ratings The Four Step GAF Scoring Method The GAF, AMA Impairment Guide, and the SRPD History of the GAF Current Uses of the GAF Problems with the GAF The Global Score Problem Reliability of the GAF Validity of the GAF Improving Your Use of the GAF Use of the GAF in a Forensic Setting Use the GAF According to the Instructions The “Split Method” of GAF Scoring The MIRECC GAF Scale Information upon which the GAF is Based Review Structured GAF Vignettes and Research Training Vignettes Research by Clinicians Trained in the GAF Summary and Conclusions References
INTRODUCTION AND COURSE OVERVIEW
The Global Assessment of Functioning Scale (GAF) is a standard method for a clinician to judge a patient’s overall level of psychosocial functioning. The GAF requires a clinician to develop an overall judgment about the patient’s current psychological, social, and occupational functioning. This global rating is made on a scale from 1-100, with 1 being the lowest level of functioning and 100 being the highest level of functioning. The primary purpose of the GAF is a quick and efficient method of assessing and summarizing a patient’s current psychiatric status (symptoms and functioning) and to assess change. It is most commonly used to assess patients at the beginning of psychiatric treatment, to monitor their progress throughout the intervention, and to provide a status at discharge.
Currently, the GAF is the most widely used method for assessing impairment among patients with psychiatric disorders (Moos et.al. 2002). The GAF was introduced as a new rating scale of overall psychiatric disturbance as Axis V in the Diagnostic and Statistical Manual of Mental Disorders (DSM III-R, American Psychiatric Association, 1987). The GAF Scale was retained in the DSM-IV (1994). Interestingly, Goldman (1992) stated that the GAF was “not widely used” as of the early 1990’s. As will be discussed subsequently, mandates for its use by such organizations as the VA Medical Center system and managed care companies have no doubt contributed to its current popularity. The GAF was retained in the DSM-IV-TR (Text Revision, 2000) with a slight modification in the instructions. The modification was necessary due to confusion regarding the time frame for the GAF rating (how was “current” defined) and how the clinician is to appropriately integrate the contributions of a patient’s psychiatric symptoms and functioning to the final GAF score (First and Pincus, 2002). In 2005, the GAF was adopted by the State of California as the primary method for determining permanent psychiatric disabilities in the workers’ compensation population. As discussed in the Schedule For Rating Permanent Disabilities (SRPD, 2005), psychiatric impairment is to be evaluated using the GAF, which is then converted to a whole person impairment (WPI). All of these issues will be discussed in great detail in this course.
The course will begin with an overview of the DSM-IV Multiaxial Diagnostic System. The GAF comprises Axis V of this system. For those in the mental health field, this will be a review. The history of the development of the GAF Scale in its current form will then be discussed. This review begins with a predecessor of the GAF, the original 100 point Health Sickness Rating Scale (HSRS) developed by Luborsky (1962). Gaining an understanding of the various revisions of the scale can help give the clinician insight into some of its current strengths and weaknesses. The scale began as the Health Sickness Rating Scale in 1962 which was revised to form the Global Assessment Scale (GAS; Endicott et al., 1976). Subsequently, the GAS was revised and developed as the GAF for inclusion in DSM III-R (1987). The GAF remained essentially the same for inclusion in DSM-IV and DSM IV-TR with the exception of a slight modification in the instructions.
As part of reviewing the diagnostic system of the DSM, the course will discuss the scoring method for the GAF. Although most clinicians who use the GAF frequently believe that they are scoring it in a valid fashion, a review of the research reveals otherwise. The course will discuss specifics of how to properly score the assessment along with common pitfalls. Later in the course, I will discuss alternative scoring conceptualizations that will help the clinician arrive at a more reliable and valid GAF result.
The course will review some of the most common uses of the GAF especially related to those that have dramatically increased its popularity. These include its use in treatment decisions by many managed care organizations, its use being mandated by the VA Medical Center system, as well as its inclusion in the Workers’ Compensation Reform legislation in California in 2005.
The course will discuss, in detail, the problems associated with the GAF in its current form. Being aware of the problems and weaknesses of any assessment can help a clinician arrive at a more valid result and avoid any pitfalls. Areas of problems that have been pointed out with the GAF in the research literature include: (1) The scale collapses three dimensions of function into one score; (2) The reliability of the scale is not good except for clinicians who have undergone extensive training as part of research projects; and, (3) the GAF shows essentially no predictive validity. Being aware of these important issues is important to help the clinician use the GAF properly.
The course will conclude with a detailed discussion about how to improve one’s use of the GAF including such things as carefully following the GAF instructions, utilizing the “split method” of scoring, reviewing a modified GAF that scores all three dimensions independently, being sure to obtain quality information for GAF determination, presenting training vignettes that have been used by the VA Medical Center system, and reviewing research utilizing clinicians trained in the use of the GAF to elucidate how their scores might correlate with those in the injured worker population.
LEARNING OBJECTIVES
· Explain the four step GAF scoring method · Describe the history of the GAF beginning with the HSRS (1962) · List the three major problem categories with the GAF · List four methods of improving the quality of a GAF rating DSM-IV-TR MULTIAXIAL DIAGNOSTIC SYSTEM
In the DSM-IV (1994, TR: 2000), a Multiaxial System is utilized which involves assessing various domains of information across several axes. This was done to assist the clinician in planning treatment and to predict outcome. The five Axes included in the DSM-IV Multiaxial Classification can be seen in Table 1.
Axis I: Clinical Disorders. The various disorders and conditions in the diagnostic classification system are recorded on Axis I except for personality disorders and mental retardation (which are reported on Axis II). The major groups of disorders that are reported on Axis I are listed in Table 2. According to the Manual, when an individual has more than one Axis I disorder, they should all be reported. The primary Axis I diagnosis is listed first.
Axis II: Personality Disorders and Mental Retardation. Personality disorders and mental retardation are reported on Axis II. The disorders to be reported on Axis II are listed in Table 3.
It is not uncommon for the patient to have more than one Axis II diagnoses, and all of these should be reported. Aside from making an actual personality diagnosis, Axis II may also be used to indicate maladaptive personality features that do not meet the threshold for a personality disorder.
Axis III: General Medical Conditions. General Medical Conditions that are possibly relevant to the management of the patient’s psychiatric disorder are recorded on Axis III. The broad categories of Axis III diagnoses can be seen in Table 4. General Medical conditions are recorded in an effort to encourage a thorough evaluation and to promote communication among healthcare providers.
Axis IV: Psychosocial and Environmental Problems. Axis IV is for recording psychosocial and environmental problems that may impact the diagnosis, treatment, and prognosis of the psychiatric diagnosis or mental disorders (Axis I and II). Psychosocial or environmental problems are generally a negative life event, although they might include a “positive stressor” if it leads to a problem. Psychosocial problems may play an etiologic role in the initiation or exacerbation of a mental disorder, or they may occur as a consequence of the patient’s psychopathology. Psychosocial and environmental problems are grouped into various categories and these are listed in Table 5.
Axis V: Global Assessment of Functioning (GAF).
Axis V is made up of the GAF Scale which is the clinician’s judgment of the patient’s overall level of functioning. The GAF Scale is a global rating of the patient’s psychological, social, and occupational functioning. The GAF instructions guide the clinician to rate the patient with respect only to psychological, social and occupational functioning. The instructions specify, “Do not include impairment in functioning due to physical (or environmental) limitation”. The GAF is scored from 1 (most severe) to 100 (highest level of function). A score of 0 is assigned if there is not enough information to make an assessment. A summary of the GAF can be seen in Table 6 and a complete print copy is available here.
The GAF is divided into 10 ranges of functioning and each decile has two components: general descriptions of severity of psychological symptoms and behavioral descriptors of social-occupational functioning. The clinician first rates the patient relative to the decile within which he or she falls. The GAF rating is within a particular decile if either the symptom severity OR the level of function falls within that range. The clinician then decides the exact GAF score from within the decile (tending more toward the adjacent decile above or below). The final score is the most severe condition of psychological symptoms or the social-occupational level of function.
The GAF has psychological and behavioral descriptors for each decile to help the clinician with decision-making and to increase reliability. In these deciles the terms, “Mild”, “Moderate”, and “Severe” are used. The DSM-IV-TR (2000) provides definitions of these terms as can be seen in Table 7, but these have not been operationally defined.
General categories of function for GAF ratings have also been established both clinically and in the research. These general categorizations can be found in Table 8.
To assist with scoring, and to ensure that no elements of the GAF rating are overlooked, a four step scoring method has been included (See Table 9).
As can be seen, the GAF is based upon taking into account information about severity of psychological symptoms, social-interpersonal functioning, and occupational functioning. These three domains encompass a wide range of assessment data, examples of which can be seen in Table 10.
According to the DSM-IV, the GAF Scale ratings should, in most instances, be for the current period (e.g. one week). However, there is also the option of providing a GAF for the “current period” and highest level over the previous year. THE GAF, AMA IMPAIRMENT GUIDE, AND THE SRPD
The California Worker Compensation Reform of 2005 mandated that the AMA Impairment Guides be used to evaluate injured workers. In the Guides, Chapter 14 (“Mental and Behavioral Disorders”) addresses psychiatric impairments. Unlike other chapters, “Numerical impairment rating are not included; however, instructions are given for how to assess an individual’s abilities to perform activities of daily living” (AMA, 2000, p. 357). However, the Guides provide a method for rating mental impairment (no impairment to extreme impairment) on each of four dimensions (Activities of Daily Living, Social functioning, Concentration, and Adaptation; See page 363). This can be seen in Table 11.
Due to the fact that the AMA Guides do not provide a numerical impairment rating, the GAF is used. This is included in the “Rating Psychiatric Impairment” section of the Schedule for Rating Permanent Disabilities (2005). This specifically includes the GAF Scale as well as the instructions from DSM-IV-TR. The instructions and guidelines for the GAF can be found in Tables 6 and 7 and these are also included in the SRPD. These additional instructions relative to the methodology of using the GAF were included in an attempt to increase its reliability and validity since these psychometric issues have plagued all versions of this instrument.
It is important to note that the terms such as “Mild and Moderate” as used in the AMA Guides are not, in any way, operationally equivalent to the same terms as used in the GAF. Also, I am not aware of any research demonstrating that the five classes of impairment as listed in the AMA Guides (No Impairment to Extreme Impairment) have any relationship to the five general categories of GAF rating (Superior Function or No Impairment to Pervasive Impairment). I have reviewed many QME and AME evaluation reports that attempt to equate these measures which, as far as I know, is not valid. Completing the AMA ratings for each of the four categories may help the practitioner conceptualize impairment for the three dimensions of the GAF, but that is all.
HISTORY OF THE GAF
Having an understanding of the development of the GAF in its current form is important to illustrate some of the problems that have plagued this assessment instrument. Many of the revisions done to the Scale over the years have attempted to address these various problems.
The Health-Sickness Rating Scale (HSRS) was first published by Dr. Luborsky in 1962 (Luborsky, 1962). The HSRS Scale was based on clinical research at the Menninger Foundation beginning in 1949 which sought to develop a standardized method for determining a patient’s overall mental health, including level of function. The HSRS is a scale which ranges from 0-100 with scores in the upper range representing minimal psychological symptoms and a high level of function, and scores in the lower range representing a more severe level of psychological symptoms and a diminished level of function. The HSRS produced one global score assessing these two dimensions.
Subsequently, Endicott et al. (1976) developed the Global Assessment Scale (GAS) which is a revision of the HSRS. The GAS has values that range from 1 (the sickest patient) to 100 (a person with no symptoms). The GAS is divided into ten equal intervals with ten values in each interval. Criteria that define each score in each interval are listed. Endicott et al. (1976) performed the first series of reliability studies on the GAS. For the non-statistician reader, reliability concepts are presented in Table 12. The researchers found intra-class correlation coefficients (ICC’s) ranging from .61 to .91 with one study showing ICC’s of .60, which is moderate at best. The higher values are certainly reasonable, but these were obtained by clinicians trained in the GAS.
Most of these ratings in the GAS research were completed by a small number of very well trained interviewers. As such, critics have raised questions about the degree to which these results were generalizable to typical clinical settings in which users are not specifically trained in the GAS. Subsequent studies found a high degree of variability in the reliability of the GAS. Dworkin et al. (1990) concluded that specific training of clinicians was necessary in order to maximize inter-rater reliability in situations in which an individual patient may be rated at various times by multiple interviewers. These researchers concluded that the training would help ensure that the differences in the GAS scores were actually due to a change in the patient’s overall level of global functioning (e.g. improvement in response to treatment) rather than simply problems with the reliability of the scale (measurement error).
The DSM-III (1980) did not include any type of GAS or GAF Scale on Axis V. Rather, a patient’s overall level of “adaptive functioning” was rated on a 1 (superior) to 7 (grossly impaired) scale. The rating was to include “highest level of adaptive functioning past year.”
The DSM-III-R (1987) used a modified version of the GAS, renamed the GAF, as Axis V. The GAF in DSM-III-R was scored on a scale from 1 to 90 and included behavioral descriptors. Data on the basic reliability and validity of the GAF were not provided in the DSM III-R.
In the DSM-IV (1994), the GAF was retained, but modified to a scale from 1 to 100 (a score of 0 indicates inadequate information for assessment). The GAF included behavioral descriptors for the decile ranges. In addition, the instructions for use are essentially the same as what is used currently.
The DSM-IV-TR (Text Revision) was released in 2000. The text revision was published to address some of the problems that were identified in the DSM-IV (First and Pincus, 2002; DSM-IV-TR ,2000). Based on the progress of research in the psychiatric literature, it was not time for a complete revision (e.g. DSM-V) which is reported to be due sometime after 2013. The DSM-IV-TR addressed some of the reported problems with the GAF Scale. According to First and Pincus (2002), there were two problems with the GAF that were identified. One source of confusion was how to operationalize the “current” time frame for the GAF. From the instructions in DSM-IV, it was unclear to the clinician as to the definition of “current” (e.g. specifically during the clinical interview, over the past week, over the past month, etc.). Therefore, in the DSM IV-TR, a sentence was added specifying that the “current period” is sometimes operationalized as the lowest level of functioning for the past week. Clinicians are given the option of recording the time period for which the GAF is completed. None of the DSM’s include reliability or validity data for the GAF. Another source of confusion with the GAF instructions in the DSM IV was the problem of integrating “disparate contributions of a patient’s psychiatric symptoms and functioning to the final GAF score” (First and Pincus, 2002, page 291). The following example is given by the authors:
For example, what should the final GAF score be for a patient who is a significant danger to himself, which would justify a GAF rating below 20, but is otherwise functioning while at work and with his family, reflecting a GAF rating above 60? Some clinicians mistakenly use an average of the two, which in this case would result in a GAF score of around 40. In fact, the final GAF score should always reflect the lower of the two ratings. In this case, the GAF score should be below 20, despite the patient’s higher social and occupational functioning.
A paragraph relative to this issue was added in DSM IV-TR to clearly outline this convention.
CURRENT USES OF THE GAF
As discussed previously, some authors (Goldman, 1992) specifically stated that the GAF Scale was “not widely used” at that time. In contrast, more recent researchers have concluded that it is currently one of the most widely used psychiatric assessments scales. Clearly, some of its meteoric rise to being one of the most popular psychiatric rating scales has included various systems that have mandated its use. For instance, many managed care provider networks and insurance carriers will utilize the GAF Scale to make treatment determinations including such things as response to psychological treatment interventions, the need for inpatient psychiatric hospitalization, and the need for continued psychiatric hospitalization. In addition, the VA Medical Center system mandated that clinicians use the GAF as part of the diagnostic assessment of all mental health patients (see Moos et al., 2002 for a discussion). Among other things, VA clinicians are required to obtain a GAF rating every 90 days for all of their mental health patients (Niv et al., 2007). Lastly, with the workers’ compensation reform in California (2005), the Schedule for Rating Permanent Disabilities (2005) mandated that “psychiatric impairment shall be evaluated by the physician using the Global Assessment of Function (GAF) Scale shown below. The resultant GAF score shall then be converted to a whole person impairment rating using the GAF conversion Table below” (page 1-12). Given all of these factors, it is no wonder that the GAF has become so popular. After the GAF for the injured worker is determined, it is then converted to a Whole Person Impairment (WPI) and some of these conversion values can be seen in Table 13. After the WPI is determined, the value is modified by diminished Future Earning Capacity (FEC), the Occupational Adjustment, and the Age Adjustment. It is outside the focus of this course to discuss these issues and, typically, calculations beyond the GAF and the WPI are not included in the evaluation report. For detailed information about these other adjustments, please see the SRPD.
The GAF, and its predecessors, were actually designed to meet the needs of not only the treating clinician, but also these types of administrative systems. In its original form, Luborsky (1962) sought to develop a standardized assessment that would yield a single global score reflecting a patient’s overall level of mental health or psychiatric function. The score was to be a combination of psychological symptoms, as well as social and occupational functioning. The idea was that this would allow for rapid communication relative to a patient’s status. Aside from summarizing the various dimensions of function, the scale necessarily needed to be easily administered and not require any special training (ease of use). This would allow for its use across a number of evaluation and treatment environments (e.g. inpatient, outpatient), patient groups (e.g. ranging from adjustment disorders to schizophrenia), and types of clinicians (e.g. psychologists, psychiatrists, social worker, psychiatric nurse, physician, etc.). The idea was to develop a scale that did not require special training, but would have adequate reliability and validity. The GAF has many strengths and these are listed in Table 14.
PROBLEMS WITH THE GAF
According to the research, the GAF has three broad problem areas which can be summarized as follows:
(1) Three different dimensions of functioning are collapsed into one composite score (2) The inter-rater reliability of the GAF (3) Its validity (specifically predictive validity).
These broad areas will be discussed in detail in the following. The Global Score Problem
A major problem with the GAF scale is that it integrates three different dimensions of functioning into one composite or total score. These dimensions include psychological symptoms, social and interpersonal functioning, and occupational functioning. A wide range of research suggests that these three dimensions do not necessarily co-vary with each other. In fact, this is reflected in the example in First and Pincus (2002). This is the example of an individual who is a significant danger to himself (GAF rating=20) but is otherwise functioning well at work and with his family (GAF=60). Other similar examples are provided in the GAF scoring instructions in the DSM-IV-TR (2000). This scoring methodology of the GAF does not allow for differentiation amongst these dimensions and, instead, forces the clinician to focus either on psychological symptoms or functioning, whichever is worse. Although many studies have addressed this problem, I will outline the following as examples.
Hilsenroth et.al. (2000) investigated the reliability and convergent and discriminate validity of the GAF Scale. For those non-statisticians who are taking the course, these terms are defined in Table 15.
In addition to the GAF, the Hilsenroth study also utilized two other important measures, the Global Assessment of Relational Functioning Scale (GARF) and the Social and Occupational Functioning Assessment Scale (SOFA). The GARF is used to indicate an overall judgment of the functioning of a family or other ongoing relationship on a hypothetical continuum ranging from competent, optimal relational functioning, to a disrupted, dysfunctional relationship. The scale relates the degree of relational functioning from optimal to disruptive by using the three major content areas of problem solving, organization, and emotional climate. This assessment is the GAF-equivalent for evaluating the dimension of social functioning. A summary version of the GARF can be seen in Table 16.
The SOFA is designed to asses an individual’s level of social and occupational functioning not directly influenced by the overall severity of psychiatric symptoms. The SOFA also considers the effects of the individual’s general medical condition in the evaluation of social and occupational functioning. Analogous to the GAF, this scale is thought to more independently assess a patient’s level of occupational function and attempts to partial out the impact of psychiatric symptoms (See Table 17). As such, Hilsenroth et al. (2000) were using measurements that would ostensibly assess the three dimensions that are collapsed in the single GAF global score. They were also choosing the instruments based on the fact that the GAF rating has been consistently found to correlate most highly with a patient’s severity of psychiatric symptoms rather than the other two dimensions. The GARF and the SOFA are actually included in the DSM-IV as experimental measures.
Hilsenroth et al. (2000) evaluated 44 patients admitted to a university based outpatient community clinic utilizing the GAF, GARF, and SOFA. The study participants carried a variety of diagnoses in categories of mood, anxiety, substance-related, and adjustment. In the study, each patient completed a videotaped semi-structured clinical interview. After the interview was completed, the clinician rated the patient on the three scales. Prior to participating in the study, the clinicians participated in both individual and group training sessions on scoring the three scales (GAF, GARF, and SOFA). After the first clinician completed the videotaped interview and the scale ratings, a second clinician viewed the videotape and independently rated the patient on the three scales. The second, external rater was unaware of the patient’s diagnosis, self-report data, and first clinician’s ratings for the three scales. The study design allowed for assessing such things as inter-rater reliability for each scale as well as the correlation between one scale and another.
The mean scores for the three scales can be found in Table 18. As can be seen, the average GAF score is about one would expect for this patient population. Again, for those non-statisticians, an example of two standard deviations around the mean for the GAF would be as follows: a mean of 64.5 plus or minus 14 (two SD’s) includes 98% of all the patients’ scores.
Table 19 shows the correlation between the first clinician’s ratings and the ratings obtained by the “external rater” who subsequently viewed the videotape (second rating). These values constitute inter-rater reliabilities and are considered very reasonable for a psychometric instrument. Again, these results are consistent with other findings relative to the GAF that suggest that one can achieve high inter-rater reliabilities when clinicians are specifically trained to the instrument. This was also found for the GARF and the SOFA.
Convergent and discriminate validity (See Table 15 for definitions) were assessed using multiple statistical methods including factor analysis, correlations amongst the scales, and correlations with other measures including the SCL-90-R Global Severity Index (a measure of psychological distress), the Social Adjustment Scale (SAS) Global Score (a measure of social impairment), and the Inventory of Interpersonal Problems (IIP) Total Score (a measure of interpersonal functioning). The relationship of the GAF to the GARF was significant (r = 0.60, P<0.0001) as was its relationship to the SOFA (r = .60, P<0.0001). However, the SOFA and the GARF demonstrated a low correlation (r = 0.34, P=0.02.) As concluded by the authors, the results of the study suggest that the GARF and the SOFA are each more related to the GAF individually than they are to each other. This suggests that these two scales are evaluating something different from one another and to a lesser extent from the GAF. Conceptually, this makes sense given the fact that the GAF is a composite of dimensions specifically measured by the GARF, SOFA, and severity of psychological symptoms.
To further assess convergent and discriminate validity, the three scales were also correlated with other measures as discussed previously. The results of some of these analyses are presented in Table 20. Significant correlations (p<.01) are marked (**). As can be seen, consistent with other studies, the GAF correlated with severity of psychological symptoms or distress. The SOFA was significantly correlated to the Social Adjustment Scale and Inventory of Interpersonal Problems (the negative correlation result is due to how the results are expressed numerically).
The data include the finding that the GAF Scale showed the largest significant relationship to a patient’s report of psychiatric symptoms (SCL-90-R Global Severity Index), but did not show a specific significant association with social impairment (SAS Global Score) or interpersonal impairment (IIP Total Score). This is consistent with previous research that has demonstrated that the GAF Global Score is most often highly correlated with the patient’s severity of psychiatric symptoms and not generally correlated with social and/or occupational functioning.
In a second study, Hay et al. (2003) evaluated the predictive validity of the GAF, the GARF, and the SOFA. In this study, a total of 97 psychiatric patients were followed for up to two years to evaluate outcome and contrast the validity of the GAF, SOFA, and GARF. Results demonstrated that the SOFA and the GAF scores on psychiatric admission were significantly negatively correlated with duration of hospital admission. These results make sense given the fact that a patient who is functioning at a higher level (SOFA) and showing less psychiatric severity (GAF) would show a shorter hospital stay (note: the reason for the negative correlation is that numerically higher SOFAS and GAF scores represent better function and mental health). The researchers also found that the SOFA ratings at psychiatric hospital discharge were significantly and negatively correlated with overall psychiatric outcome at two year follow-up. They conclude that the SOFA (a measure of adaptive functioning) had better predictive and concurrent validity than the GAF or the GARF.
A large-scale study of the GAF was conducted by Moos et al. (2002). This study was done using the VA Medical Center network data set since routine GAF evaluations are required. The researchers obtained GAF results used to assess global functioning for 9854 patients with psychiatric or substance abuse disorders, or both. These assessments were done by clinicians across 148 VA facilities. The clinicians were experienced mental health professionals who routinely utilize the GAF within the context of routine clinical diagnostic interviews. In the study, patients were classified according to five categories of GAF scores, and these can be seen in Table 21. Scores from 91-100 are indicative of no symptoms at all. Statistical analysis was completed across the five groups of patients based on these GAF categorizations. In addition to the GAF, the researchers also collected demographic data as well as data related to receipt of services such as inpatient or residential care (number of days), outpatient care (number of days), as well as data on the patient’s symptoms and social and occupational functioning. Data analysis also included information about alcohol and drug use.
Multiple regression analysis was utilized to identify the best independent predictors of GAF ratings. As discussed by the researchers (page 733), “when entered first in the regressions, the social or occupational functioning indexes accounted for only 1% of the variance in GAF ratings. Patient’s psychiatric diagnoses, previous inpatient care, psychiatric symptoms, substance use, and substance related problems were each significantly associated with higher levels of global impairment.” The researchers go on to state that “after these variables were entered, employment status was the only social or occupational index that independently predicted global functioning and it accounted for less than 1% of the variance in clinician’s GAF ratings.” The researchers found the same results for the continuous GAF as opposed to the categorized GAF ratings. The authors conclude that, “however, in this study, clinician’s rating of global impairment were more closely associated with patient’s diagnoses, previous treatment, and severity of symptoms than with their social or occupational functioning” (page 735). They also conclude that “our findings and the results of these studies indicate that GAF rating provide little or no information about social or occupational functioning that is independent of clinician’s judgment about diagnoses and the severity of symptoms” (page 735). They also found that the GAF ratings were not predictive of treatment outcomes.
These studies underscore the conclusion that the global score generated by the GAF Scale is problematic in that it attempts to incorporate three different dimensions of function including severity of psychological symptoms, social and interpersonal functioning, and occupational functioning. The research demonstrates that the GAF is primarily tapping into the severity of psychological symptoms. In many ways, this is not surprising since the majority of the data available to the clinician relative to the GAF assessment is likely to be related to the severity of psychological symptoms. Severity of psychological symptoms is most often assessed through the clinical interview, mental status examination, and objective psychological testing. Gathering detailed and objective data on the other dimensions (social and occupational) is much more difficult. As such, most clinicians likely develop their GAF ratings based on the clinical data related to severity of psychological symptoms. Researchers are clearly aware of this problem and the DSM-IV has included the two experimental measures which attempt to assess both social and occupational function. As will be discussed in more detail subsequently, it is important for the clinician to be aware of these potential problems with the GAF and the over reliance and focus on psychological symptoms.
Reliability of the GAF
Interviewer rater scales such as the GAF and its predecessors are notoriously vulnerable to problems of low inter-rater reliability because they tend not to operationally define terms and are used by examiners with different levels of training and experience. This is a significant issue since scores for the scales need to be comparable and meaningful across situations (e.g. treatment programs, research projects, etc.) for the scale to have any value. Variability can occur for many reasons including that some raters may have a propensity to make high GAF ratings, whereas others may have a tendency to make low ratings. As we discussed previously reliability is critical for a scale or test to have any use or meaning (See Table 12). In fact, the validity of the measure is limited by its reliability both from a mathematical and conceptual standpoint. All of the various versions of the GAF (beginning with the HSRS, and including the GAS, GAF in DSM-IV and the current GAF in DSM-IV-TR with modified instructions) have been plagued by problems with inter-rater reliability. In fact, the GAF instructions have become more and more structured in an effort to address this problem. Adequate inter-rater reliability can be achieved for the GAF if the clinicians undergo structured training. This generally involves the presentation of clinical vignettes with a subsequent review of the rationale behind why a specific GAF was assigned. However, the GAF was designed to be quick, easy to use, and without specific training. Therefore, in clinical practice, very few practitioners have actually undergone the type of training that is common in the published research articles. This is nicely exemplified in a study completed by Bates et al. (2002). The researchers examined the impact of a brief training program on clinicians using the GAF. In the study, 31 staff members within one VA Medical Center were presented two vignettes without any training in the GAF scale. The clinicians were asked to provide a GAF rating for each of the vignettes. Subsequently, they participated in a brief training session aimed at increasing the reliability of GAF assessments. After the training was completed, the clinicians again rated the two vignettes using the GAF. This allowed the researchers to compare the results for pre and post training. The investigators were really interested in two issues: (1) What were the inter-rater correlations pre-training versus post-training; and (2) What “strategies” were clinicians using to arrive at the GAF rating prior to their training.
The nine strategies used by the clinicians prior to GAF training can be found in Table 22. As can be seen, the clinicians utilized a number of different strategies, only one of which is consistent with the scoring instructions for the GAF (**). As can be seen in Table 23, incorrect strategies were used 90% of the time and the most common strategy was to average the severity of symptoms and functioning level. The correct strategy was utilized less than 10% of the time by untrained clinicians. Inter-rater reliability was consistent with previous research in showing poor results prior to training and improved results after training.
Table 23 also demonstrates the effect of the training on clinicians’ GAF scores. Even after training, only 64% of clinicians used the correct strategy (of course, this means 36% continued to use incorrect strategies). After training, the obtained GAF scores were significantly different from what was obtained prior to the training. The post-training GAF scores were much closer to the “criteria” scores. The authors conclude that “the study highlights common errors and points to the need for formal training in the use of the scale”.
In a more recent study, Vatnaland et al. (2006) asked the question, “are GAF scores reliable in routine clinical use?” In their introduction and review of the literature, the authors make the point that the GAF scale has been considered as a reliable tool, but most of the studies of GAF reliability have been based on special conditions including prior training, test awareness, and under strictly controlled research conditions. The study sought to assess the reliability of the GAF as it might be commonly used in a clinical situation.
In the review of existing research on GAF inter-rater reliability, the authors determined that all but two of the studies concluded that the reliability is “excellent” as measured by intra-class correlations (ICC greater than 0.74). However, they point out two main problems with this body of research:
Raters often undergo prior calibration and training, and are generally selected from dedicated students or researchers. The authors conclude that the positive results may not be generalizable to other settings and may reflect what has been termed “within-center inter-rater reliability.”
The authors also point out that most studies report results from clinician-raters who are highly aware that their GAF scores are being monitored. Such test awareness interacts with the rating process.
The study sought to determine the inter-rater reliability of the GAF as used in routine clinical practice where none of the above conditions exist. In the study, 100 consecutive psychiatric admissions were assessed and assigned a GAF score by three different raters, both at admission and discharge. The same individual staff member did not necessarily obtain GAF scores at admission and discharge. The clinicians did not utilize any type of structured interview guide or other tools in the process of assessing the GAF scores. In addition, formal training in the use of the GAF Scale had not taken place. For each patient, a “criteria” GAF was also established by two psychiatrists trained in the use of the GAF. These criteria GAF scores were determined for both admission and discharge status.
Table 24 shows the inter-rater reliability between the non-trained clinicians and psychiatrist number one (admission and discharge), non-trained clinicians and psychiatrist number two (admission and discharge), and the relationship between the GAF scores determined by the two experienced psychiatrists. As can be seen, the correlations amongst the clinicians and the criteria scores were poor. The inter-rater reliabilities between the two experienced psychiatrists were quite high and consistent with previous research investigating clinicians who have been trained in the use of the GAF. The authors conclude:
“the results reported above suggested there are critical issues concerning the reliability of the GAF when applied in a realistic clinical context. ICC coefficients between scores obtained by standard department procedures and those by the two research raters at admission were 0.39 and clearly inadequate. In terms of standard practice, this level is less than what would be accepted as a fair agreement beyond chance. This indicates that only about 40% of rater differences reflect real differences in subject conditions. This study again underscores that the likelihood that GAF ratings in real world settings tend to have a fairly low level of reliability.”
These issues raise concern about the use of the GAF in the workers’ compensation system. As discussed previously, the GAF rating is converted to a whole person impairment and is used to determine permanent psychiatric disability. Therefore, inter-rater reliability as well as agreement with the criteria rating (“true” GAF value) is critical. It is important for clinicians completing GAF ratings within the workers’ compensation system to be aware of these reliability problems with the GAF especially when used by practitioners outside of a research setting and who have not undergone specialized training. Clearly, this represents the vast majority of those who are providing GAF assessments of injured workers. It also underscores the complexity of assigning a GAF score. It certainly goes beyond choosing a number from the scale based on clinical instincts that almost always reflect severity of psychological symptoms to the exclusion of the other dimensions. The last section of this course will review methods for improving the clinician’s use of the GAF both in terms of reliability and validity.
Validity of the GAF
As can be seen in Table 15, there are various types of validity when discussing psychometric testing. In general, validity refers to the extent to which a test for scale measures what it is supposed to measure. Conceptually, this is somewhat difficult to determine relative to the GAF since it is a composite of three dimensions including psychological symptom severity, social and interpersonal functioning, and occupational functioning. The research reviewed did not demonstrate that a construct of what the GAF is “suppose to measure” has been established. Therefore, most studies investigating the validity of the GAF look at such things such as,
Does it correlate with what we would expect it to correlate with (convergent validity).
Does it not correlate with measures we would expect it not to correlate with (divergent or discriminate validity).
Does it predict what we would expect it to predict (predictive validity).
Probably the largest study of predictive validity of the GAF has been carried out by Moos et al. (2000, 2002) using a VA Medical Center database. As discussed previously, all VA mental health clinicians are required to utilize the GAF when treating patients in this system. This provides a valuable set of data that is amenable to investigating the usefulness of the GAF. In one study (2002), the researchers analyzed the GAF and other data for 9854 patients with psychiatric or substance abuse disorders or both. Since the use of the GAF was mandated by the VA Medical System, scores were available at multiple time points including entry into treatment and follow-up assessments six to 12 months later. In addition, other measures of psychiatric symptoms, as well as social and occupational functioning were contained within the data set. In analyzing the GAF scores, Moos utilized a method common in previous research in which the ratings are combined into fewer categories. The researchers divided the GAF into five categories and these were presented previously in Table 21.
Consistent with previous research, the authors concluded “however, in this study, clinicians ratings of global impairment were more closely associated with patient’s diagnoses, previous treatment, and severity of symptoms then with their social or occupational functioning” (p. 735). Again, the research consistently demonstrates that the GAF is primarily a measure of severity of psychological symptoms to the exclusion of social and occupational functioning. They go on to state that “once these clinical and symptom-related factors are considered, indexes of social and occupational functioning made only negligible contributions to the GAF ratings” (p. 735). They go on to state that “our findings and the results of these studies indicate that GAF ratings provide little or no information about social or occupational functioning that is independent of clinician’s judgment about diagnosis and the severity of symptoms” (p. 735).
Relative to the GAF predicting treatment outcome, the authors conclude that, “moreover, we found little or no relationship between GAF ratings and either symptom outcomes or social or occupational outcomes. This result was the same when we used the continuous GAF scores for the five categories of GAF scores” (p. 735). The authors go on to state that this finding was a replication of a previous study completed which found minimal associations between clinicians’ ratings of patient’s current level of function and patient’s self-rated symptoms and functioning at follow-up (Moos et al., 2000). They conclude that “in conjunction with the lack of previous positive findings that link GAF ratings to outcomes, these findings cast doubt on the value of including GAF ratings as predictors of treatment outcome in an outcomes monitoring system” (p. 735). They state that “although intuitively appealing, a brief uni-dimensional rating of global functioning cannot capture changes in psychological, social, and occupational functioning that are only moderately inter-related at best.”
These results are important to keep in mind when utilizing the GAF in the workers’ compensation system. The Moos et al. studies (2000, 2002) suggest that a global rating such as the GAF is not predictive of such things as social or occupational functioning. In fact, research has demonstrated that, in all except for the most severe of psychiatric cases, psychological measures are not predictive of future occupational functioning. Certainly, for someone who is rated at the very high level of psychiatric impairment based on psychological symptoms (e.g. a very low GAF in the 1-20 range), prediction of occupational and social functioning in the future is relatively straight forward. However, when one is faced with a heterogeneous patient population showing a variety of symptoms, the predictive power of the GAF and psychological tests seem to fall apart. This problem is echoed by MacDonald-Wilson et al. (2001) who state that “adding to the difficulties in assessing work function amongst individuals with a psychiatric disability is a question of whether and how well psychiatric diagnoses and symptoms (i.e. psychiatric impairment) can predict work capacity or functioning” (p. 221). The authors go on to state that “a variety of early studies reported little relationship between future work performance and various assessments of psychiatric symptoms. From these studies, there appear to be no symptoms or symptom patterns that were consistently related to work performance” (p. 222). In reviewing the research, they did state that several long term follow-up studies suggest that psychotic-like features and symptoms were associated with poor role functioning and less likelihood of being employed. This is consistent with what we discussed earlier in that the GAF is likely predictive of future psychiatric disability if a patient is scoring in the very low ranges (high psychiatric impairment). MacDonald-Wilson et al. (2001) concludes that “taken together, these studies do not support the conclusion that psychiatric diagnosis alone is a good predictor of vocational capacity. While there is sufficient evidence to suggest that a diagnosis of a psychotic disorder is associated with somewhat poor vocational outcomes, these relationships appear modest. Similar conclusions can be drawn about symptoms. Psychiatric symptoms, unless severe, bear a small relationship to vocational functioning. IMPROVING THE USE OF THE GAF
The use of the GAF in determining permanent psychiatric impairment is mandated in the California Workers’ Compensation System. As discussed previously, the GAF is converted to a whole person impairment, according to the Schedule for Rating Permanent Disabilities (2005). Therefore, reliable and valid use of the GAF is certainly important. As we have seen, there are many problems with the GAF Scale.
Use of the GAF in a Forensic Setting
If you function as a QME, you are performing evaluations and rendering opinions within the context of a forensic setting. By nature, this is often an adversarial environment (applicant and defense). Therefore, your conclusions must be defensible and this includes the GAF determination. Having an understanding of the problems inherent in the GAF can help the practitioner produce reliable, valid, and defensible GAF determinations.
As we have reviewed previously, there has been a myriad of research using the GAF in psychiatric inpatient settings, university based clinics, and the VA Medical Center system. I could not locate any articles that address its use within a workers’ compensation system. Whenever one is functioning within a workers’ compensation system, factors such as patient credibility and the validity of self-reported information must be taken into account and objectively assessed. The GAF contains no validity scales and is largely based on self-report data from the patient. Its use within a forensic setting is therefore problematic due to its very nature. This also makes it more difficult to defend one’s conclusion relative to this instrument. However, there are ways in which the use of the GAF can be improved by the individual practitioner as well as making the GAF conclusions more defensible. These will be reviewed in the following section.
Use the GAF According to the Instructions
This issue seems almost ridiculous to mention but it is important to remember to use the GAF according to the instructions. As we discovered in the review of research literature relative to the GAF, one of the most common problems is that clinicians simply do not follow the instructions. This was poignantly highlighted in the study of Bates et al. (2002). As you will recall, nine strategies were used by untrained clinicians to achieve their GAF ratings and of these, only one was consistent with following the directions for the GAF Scale. Even after training, only 64% of clinicians followed the directions. As the researchers pointed out, the most common strategy was to average the severity of symptoms and functioning level. Although I am speculating, this may be due to a clinician’s inclination to incorporate all of the available data into one global measure rather than choosing one dimension over another. Regardless of the reasons, this strategy is incorrect, according to the instructions. Interestingly, these researchers also found that 36% of the clinicians continued to use incorrect strategies even after structured training on the GAF.
If this study demonstrated that untrained clinicians were using incorrect strategies 90% of the time, the base rate is likely equal to that or higher in routine use (since the clinicians in the study knew they were being monitored). The frequency of incorrect use in a forensic setting, such as completing a QME evaluation, is simply unknown. However, I believe we can assume that it is likely quite high. Therefore, the first strategy in improving the quality of a GAF rating is to simply follow the directions as outlined in the DSM-IV-TR and the SRPD. The four step process as outlined in these publications (See Table 9), as well as the enhanced instructions in DSM-IV-TR, will certainly help with this process. As can be seen in the four step process, the clinician is to choose a GAF level that reflects either the individual’s symptom severity OR level of functioning, whichever is worse.
Another component of the GAF instructions is that the clinician is not to include impairment due to physical (or environmental) limitations. Many researchers have questioned this aspect of the GAF since they do not believe that physical impairment can accurately be separated from psychiatric impairment (it does appear to represent mind-body dualism). Even so, according to the GAF instructions, one must attempt to do so as diligently as possible. I could find no research or other literature that provides the clinician with any empirical guidance as to how this is to be accomplished aside from a completely subjective assessment (a clinical “guesstamate”). The “Split Method”
To improve the quality of your GAF score you may want to consider using the “split” method as initially suggested by First (1995). Dr. First has recommended that the clinician should “treat the GAF as it were two scales: one for symptom severity and another for level of function. Then… make one rating for severity and a second for level of functioning. The worst of the two can be used as the GAF” (p. 259). It has been suggested that this is an especially useful approach when there is some discrepancy between a patient’s symptomatology and level of functioning (e.g. a patient with psychotic symptoms who nevertheless functions fairly independently). The research has also suggested that this rating rule can help to counteract an apparent tendency to try and somehow either combine symptomatology and functioning or take an average of the two. Certainly, this approach is useful in being a structured reminder to the clinician to assess both symptomatology and function, then choose the worst of the two. However, it does continue to combine both social and occupational functioning into one variable, and these certainly may co-vary differently. However, this is consistent with the directions of the GAF and may be a problem inherent in this scale. The MIRECC GAF Scale
The VA Mental Illness Research, Education, and Clinical Centers (MIRECC) has developed a modified GAF scale that independently assesses all three components of the traditional GAF (psychological, social, and occupational). As discussed by the authors, the DSM-IV-TR directs raters to base the GAF score on the worst functioning of these three domains (I believe it’s actually two domains, symptom severity versus function). As such, the GAF score typically represents one dimension, and clinicians do not know which dimension is represented or how the patient fairs on other dimensions, rendering the GAF limited in its utility (Niv et al. 2007, p. 529). Niv (2007) developed the MIRECC GAF which has behavioral descriptors for each of the three domains of function. An overview of the rating system can be found in Table 26 and a copy of the actual scale can be found here.
The MIRECC GAF has been found to have excellent inter-rater reliability for all three of the subscales (r = .98 to .99) when used by specifically trained clinicians. Analysis of convergent and discriminate validity were also completed. The authors concluded that “results demonstrated good convergent and discriminate validity, with 40% of the variance accounted for by work and school status”. In other words, the three dimensions correlated with external measures in the expected pattern (e.g. occupational GAF rating with occupational variables, but not with other psychological variables, etc.).
In addition, the researchers investigated predictive validity of the instrument. Outcome measures included such things as work status, presence of family support, presence of a close friend, and psychological symptoms. The authors stated that “in terms of predicting work status at follow-up, the MIRECC GAF occupational ratings were significantly predictive, whereas MIRECC GAF social and GAF symptoms ratings were not” (Niv et al, 2007, p. 533). Similarly, the MIRECC GAF social scores were significant predictors of follow-up family and social support. The occupational and psychological symptom scales were not predictive of family support or a close friend at follow-up.
In even more detail, this study underscores the tri-dimensional nature of the GAF global score. The conceptualization of the MIRECC GAF goes beyond the instructions for the traditional GAF by splitting apart social and occupational functioning. Even so, being aware of these data can help the clinician think in terms of the various dimensions represented in the GAF global score and taking care to assess these carefully and independently to arrive at a valid conclusion. Quality of the Information upon which the GAF Rating is Based
One of the critical factors in completing psychological evaluations (including impairment) within a forensic setting is firmly establishing the credibility of the patient as well as relying as much as possible on objective data. As the old saying goes, “Garbage In-Garbage Out”. If the quality of the data upon which a GAF rating is based is unknown, there is the risk that it is unreliable, biased and invalid; as such, the quality of the GAF rating cannot be established. The GAF rating is only as good as the data upon which it is based.
It is beyond the scope of this course to review all of the research literature and approaches to establishing patient credibility relative to psychological assessment and the reader is referred elsewhere (Rogers, 2008). The importance of establishing the credibility of the patient is due to the fact that the GAF rating is often based primarily on self-report data or questionnaires that are highly face valid. If one can establish the credibility of the patient, then all of the other data can certainly be determined to be of higher value. Often, credibility is established through the use of standardized psychological testing that includes sophisticated validity scales. Of course, the gold standard in this arena is the MMPI-2. Often, clinicians will add other tests of credibility or symptom embellishment especially in the realm of cognitive functioning. Whatever method is chosen, the clinician should utilize a standardized approach in an attempt to establish the patient’s credibility of self-report and response to face valid instruments. If the patient shows “symptom embellishment” or amplification, then this must be taken into account when determining the GAF. This issue of patient motivation is also discussed in the AMA Impairment Guides (p. 358).
Once credibility is established, the clinician can improve his or her use of the GAF by purposely assessing all three of the dimensions that make up the global score. In routine QME clinical practice, it appears that this is rarely done. Certainly, data related to severity of psychological symptoms is readily available. Therefore, the issue is to increase the focus on assessment of social and occupational functioning. This might be done through the use of an instrument such as the MIRECC GAF or occupational and social functioning might be completed through the use of brief questionnaires that are readily available. Some examples of these questionnaires are outlined in Chapter 1 of the AMA Impairment Guides and other examples can be found throughout the research we have reviewed previously.
Review GAF Examples and Vignettes Presented in the Research Literature
Another method for improving one’s quality of GAF rating is to review the literature for example GAF ratings based on various clinical samples and vignettes. The vast majority of these articles have used clinicians that were specially trained in the use of the GAF. By having an understanding of what type of clinical presentation represents a criterion GAF score, the user can develop higher quality scores even without the benefit of structured training.
One method for accomplishing this is to look at some of the training vignettes that have been used by the VAMC along with the criteria GAF scores assigned to these examples. Some of these training vignettes are presented in Table 27 along with the GAF score assigned to each case. I could not locate example vignettes in the higher levels of functioning (e.g. greater than the decile of 50-60).
Another technique in this category includes looking at the average GAF ratings assigned by trained clinicians to various psychiatric diagnostic groups within the context of research studies. This can be helpful since, as we have discussed, psychiatric diagnosis is highly correlated with GAF. Some examples of these studies are presented in Table 28. This type of information can give the practitioner a general idea of GAF ranges for various diagnostic groups that are entering different types of treatments. Of course, this type of data is largely focused on severity of psychiatric symptoms since social and occupational functioning is rarely reported. However, that is generally the case with the GAF anyway.
SUMMARY AND CONCLUSIONS
The Global Assessment of Functioning Scale (GAF) is a standard method for a clinician to judge a patient’s overall level of psychosocial functioning. The GAF requires a clinician to develop an overall judgment about the patient’s current psychological, social, and occupational functioning. These dimensions are collapsed into a single global score. In 2005, the GAF was adopted by the State of California as the primary method for determining permanent psychiatric disability in the workers’ compensation population. As discussed in the SRPD (2005), psychiatric impairment is to be evaluated using the GAF, which is then converted to a whole person impairment (WPI). In this course, several problems with the GAF were discussed including attempting to include three areas of function in one score, the inter-rater reliability, and its validity. Being aware of these limitations can help the clinician use the GAF in a more accurate fashion. Suggestions for improving one’s GAF skills include carefully following the instructions, using the “split method” to help with scoring, relying on objective assessment of the three dimensions, and being familiar with research results for the GAF when used by trained clinicians.
REFERENCES
American Medical Association (AMA, 2000). The Guides to the Evaluation of Permanent Impairment, Fifth Edition. American Medical Association.
American Psychiatric Association. (1980, 1987, 1994, 2000). Diagnostic and Statistical Manual of Mental Disorders (III, III-R, IV, IV-TR Editions). Washington, DC: Authors.
Bates et al. (2002). Effects of brief training on application of the Global Assessment of Functioning Scale. Psychological Reports, 91, 999-1006.
Dworkin et al. (1990). The longitudinal use of the Global Assessment Scale in multiple-rater situations. Community Mental Health Journal, 26, 335-341.
Endicott et al. (1976). The Global Assessment Scale: A procedure for measuring overall severity of psychiatric disturbance. Archives of General Psychiatry, 33, 766-771.
First, M.B. and Pincus, H.A. (2002). The DSM-IV Text Revision: Rationale and potential impact on clinical practice. Psychiatric Services, 53, 288-292.
Garcia-Cabeza, I., et al. (2001). Subjective response to antipsychotic treatment and compliance in schizophrenia. A naturalistic study comparing olanapine, risperidone and haloperdol. BMC Psychiatry, 1:7. (www.biomedcentral.com/1471-244X/1/7, accessed 02-20-2010).
Goldman, Skodol and Lave (1992). Revising axis V for DSM-IV: a review of measures of social functioning. American Journal of Psychiatry, 149, 1148-1156.
Hall, RCW. (1995). Global Assessment of Functioning: A modified to scale. Psychosomatics, 36, 267-275.
Harel, TZ et al (2002). A comparison of psychiatrists’ clinical-impression-based and social workers’ computer-generated GAF scores. Psychiatric Services, 53, 340-342.
Hay et al. (2003). A two-year follow-up study and prospective evaluation of the DSM-IV Axis V. Psychiatric Services, 54, 1028-1030.
Howes, JL et al (1997). Outcome evaluation of a short term mental health day treatment program. Canadian Journal of Psychiatry, 42, 502-508.
Hilsenroth, MJ et al (2000). Reliability and validity of DSM-IV Axis V. Am. Journal of Psychiatry, 157, 1858-1863.
Kessler et al (2003). Screening for serious mental illness in the general population. Archives of General Psychiatry, 60, 184-189.
Luborsky, L. (1962). Clinicians’ judgments of mental health. A proposed scale. Archives of General Psychiatry, 7, 407-417.
MacDonald-Wilson et al. (2001). Unique issues in assessing work function among individuals with psychiatric disabilities. Journal of Occupational Rehabilitation, 11, 217-232.
Moos, RH, McCoy, L, Moos, BS (2000). Global assessment of functioning (GAF) ratings: Determinants and role as predictors of one-year treatment outcomes. Journal of Clinical Psychology, 56, 449-461.
Moos, RH, Nichol, AC, Moos, BS (2002). Global assessment of functioning ratings and the allocation and outcomes of mental health services. Psychiatric Services, 53, 730-727.
Narud, K et al. (2005). Quality of life in patients with personality disorders seen at an ordinary psychiatric outpatient clinic. BMC Psychiatry, 5:10, www.biomedcentral.com/1471-244X/5/10. (accessed 2-20-2010)
Niv et al. (2007). The MIRECC Version of the Global Assessment of Functioning Scale: Reliability and Validity. Psychiatric Services, 58, 529-535.
Piersma and Boes (1997). The GAF and psychiatric outcome: A descriptive report. Community Mental Health, 46, 117-121.
Rogers, R. (2008). Clinical Assessment of Malingering and Deception, Third Edition. New York: Guilford.
Roy-Byrne et al (1996). Evidence for limited validity of the revised global assessment of functioning scale. Psychiatric Services, 47, 864-866.
Soderberg et al. (2005). Reliability of Global Assessment of Functioning Ratings made by clinical psychiatric staff. Psychiatric Services, 56, 434-438.
Thienhaus, OJ et al. (1990). A study of the clinical efficacy of maintenance ECT. Journal of Clinical Psychiatry, 51, 141-144.
Uehara, T (1997). Correlations among depression rating scales and a self-rating anxiety scale in depressive outpatients. The International Forum for Psychiatry. (accessed 2-20-2010).
Vatnaland et al. (2007). Are GAF scores reliable in routine clinical use? Acta Psychiatric Scandivania, 115, 326-330.
|
PLEASE LOG IN TO VIEW OR TAKE THIS TEST
This test is only active if you are successfully logged in.