Kohn and Slomczynski, Chapter 2

Melvin L. Kohn and Kazimierz M. Slomczynski

Social Structure and Self-Direction: A Comparative Analysis of the United States and Poland
(Cambridge, MA.: Blackwell, 1990)

CHAPTER 2

THE METHODOLOGY OF THE RESEARCH

This chapter lays the groundwork for the analyses to come. We begin with a general discussion of a methodological issue that will pervade the entire book: assuring comparability of meaning in cross-national research. This issue is no less important at the stage of collecting data than at the stage of analyzing the data. Our overriding concern with issues of comparability of meaning provides the context for our description of the methods employed in collecting and coding the data of both the U.S. and Polish surveys. We particularly emphasize the methods employed in the Polish surveys, because the Polish surveys were designed to be replications of a survey that had already been carried out in the United States. Thus, all efforts to assure comparability in methods of data-collection were necessarily made in the Polish, not the U.S., surveys. Our efforts to achieve comparability at the data-collection and data-processing stage of the inquiry is of crucial importance for our analyses, for no matter how relentlessly we pursue issues of comparability of meaning and measurement in our analyses of the data, it is of no avail unless equal attention has been paid to such issues in the collection and coding of those data.

THE CRITICAL IMPORTANCE OF COMPARABILITY OF MEANING IN CROSS-NATIONAL RESEARCH

The most fundamental methodological issue in cross-national research is whether our comparisons are meaningful. Stefan Nowak (1976, p. 105) posed the issue with characteristic clarity:

How do we know we are studying 'the same phenomena' in different contexts; how do we know that our observations and conclusions do not actually refer to 'quite different things', which we unjustifiably include into the same conceptual categories? Or if they seem to be different, are they really different with respect to the same (qualitatively or quantitatively understood) variable, or is our conclusion about the difference between them scientifically meaningless?

The issue is so complex that a thorough treatment would require quite another book. In this book, instead, we deal with issues of equivalence of concepts and indices in a more practical manner, trying only to assure that the particular concepts we use are appropriate for both the United States and Poland, and that the methods we employ and the indices we develop are appropriate to each country and comparable for the two countries. In principle, methodological differences between studies could produce either cross-nationally consistent or inconsistent findings (Finifter 1977). Still, when one finds crossnational similarities despite differences in research design, even despite defects in some of the studies, it is unlikely that the similar findings were actually produced by the methodological differences. Substantive similarity in the face of methodological dissimilarity might even argue for the robustness of the findings. But when one finds crossnational differences, then dissimilarities and defects in research design make for an interpretive quagmirethere is no way to be certain whether the apparent differences in findings are real or artifactual. The best that one can hope to do is to marshall available evidence that methodological incomparability is insufficient to explain the differences in findings.

To obviate the possibility that any differences we may find between the United States and Poland are merely artifacts of differences in method, we have tried to design the studies to be as comparable as possible, to establish both linguistic and conceptual equivalence in questions and in coding answers, and to establish truly equivalent indices of the underlying concepts. With that intent, we shall in this chapter describe our efforts to make the design of the two studies as comparable as possible; to ascertain, insofar as possible, linguistic equivalence in the questions asked; and to code the data in ways that minimize the possibilities of artifactual differences. In later chapters, we shall be concerned with similar questions about the analysis of the data, questions pertaining to the meaningfulness and cross-national comparability of indices and of causal models.

SAMPLES, METHODS OF DATACOLLECTION, AND PROCESSING OF THE DATA

The U.S. Surveys

The two U.S. surveys were designed by Kohn and his close collaborator, Carmi Schooler, with the fieldwork carried out by the National Opinion Research Center (NORC) of the University of Chicago. The first survey, conducted in 1964, was based on a representative sample of 3101 men employed in civilian occupations throughout the country. The second survey, conducted ten years later, was a follow-up of a representative subsample of those men--687 men still under 66 years of age. Every man in the original sample who was the father of one or more children between the ages of three and fifteen and living at home was asked about his values visavis one of those children, randomly selected. When, ten years later, NORC re-interviewed the subsample of men selected for the follow-up survey, they also interviewed the wives of married respondents and the "child" about whom the values questions had been asked ten years beforethese "children" now being 1325 years old. We shall discuss these data further in Chapter 7, when we deal with the transmission of values in the family.

Sample Selection.

The sample employed in the 1964 survey is an area probability sample of 3101 men. These men are representative of all men in the United States, 16 years of age or older, who were at that time employed in civilian occupations at least 25 hours a week. Because a major focus of the inquiry is job conditions, and because the experiences of unemployment might overshadow the experiences of past employment (see Bakke, 1940), Kohn and Schooler excluded men not currently employed. They also excluded men in the military, since the problems both of sampling and of inquiry seemed too formidable to make their inclusion worthwhile. They excluded women, because they thought that their inclusion would have required a much larger sample. They did not exclude any men on grounds of race, language (a few men were interviewed in languages other than English), or any other basis.

In interviewing fathers, they wanted to direct the questions about values and parent-child relationships to one specific child, so chosen as to insure an unbiased selection and an even distribution of ages. To make that selection, the interviewer asked for the names and ages of all children aged three through fifteen who were living at home and then applied a random-sampling procedure (described in Kohn 1969, p. 238) to select a particular child.

Pretesting the interview schedule.

Kohn, Schooler, and their associates carried out the early rounds of unstructured and, later, structured pretest interviews in the Washington-Baltimore area. The final pretests were conducted by NORC, which carried out 100 interviews--six in each of ten widely separated places, with an emphasis on small towns and rural areas, plus twenty in Chicago and twenty in New York City, with an emphasis in these cities on less educated respondents. (The intent in concentrating on small town, rural, and less educated urban respondents was to subject the interview schedule to the most demanding tests.) Kohn and Schooler assessed the detailed reports of the interviews and met with the interviewers from Chicago and New York to discuss each question in the interview schedule. The batteries of items designed to measure the several aspects of orientation to self and society were statistically assessed for their scale characteristics, other items for their clarity of meaning and distributional characteristics. The schedule was revised and shortened once again, and then it was ready for use in the survey proper.

Administration of the 1964 survey.

The survey was carried out by the field staff of NORC during the spring and summer of 1964. Overall, 76 percent of the men selected for the sample gave reasonably complete interviews. Considering that it is more difficult to get employed men than most other people to participate in a survey, and that the interviewers were asking the respondents to undertake an especially long interview (the median interview took two and a half hours), we think this rate is acceptable. From the point of view of generalizing to the population at large, though, it is necessary to examine the characteristics of the nonrespondents. Loss of respondents assumes particular importance if it occurs disproportionately in delimited segments of the population. For our research, it is especially important that nonrespondents not be concentrated in a particular social stratum. A complete analysis of this issue is not possible, for we lack data on the socioeconomic status of many of the nonrespondents. One important source of information, however, is available: Most medium-sized cities have had city directories that contain tolerably accurate occupational data. For those cities, it was possible to determine whether or not the nonrespondents differ in occupational level from the men who granted interviews. Kohn and Schooler found little difference overall. The one exception is that the nonresponse rate for small business owners is somewhat higher than that for other men, but this difference is probably an artifact of city directories having more complete coverage of small business owners than of other segments of the population. We conclude that, for cities where data are available, nonresponse rates do not seem to vary appreciably by occupational level.

Furthermore, for those cases where data about occupational level are available, there is no relationship between the occupational levels of nonrespondents and the reasons they gave for not granting an interview. Nor is there any relationship between nonrespondents' occupational levels and the interviewers' characterizations of their apparent attitudes. Rates and types of nonresponse do not seem to be appreciably related to social-stratification position.

There is, however, one social characteristic that made a notable difference in the rate of nonresponse--the size of the community in which men live. Nonresponse rates are directly proportional to size of community. Nothing in the data explains this phenomenon. The simple fact is that the larger the community, the more difficult it was to get employed men to grant long interviews. These analyses suggest that the final sample is reasonably representative of the population to which we generalize, except insofar as it underrepresents larger cities and overrepresents smaller communities.

Further comparisons of the characteristics of the 3101 men actually interviewed in the 1964 survey with the characteristics of employed males enumerated in the 1960 decennial Census indicate that the sample is closely comparable to the population of employed men. There are only two types of discrepancy between the characteristics of the sample and those of employed males generally: (1) As was shown by the analysis of nonrespondents, the sample underrepresents men in the largest metropolitan areas. (2) The selection criteria imposed certain limitations, i.e., that the respondents be at least sixteen years old, be currently employed, and be employed at least twenty-five hours per week. Differences between these criteria and the Census definition of "employed males" are reflected in the findings that the sample is somewhat older and better educated than are employed males generally, with a larger proportion married and a smaller proportion at the very bottom of the income range. Aside from differences in criteria and the sample bias against metropolitan areas, though, the sample matches the 1960 Census data quite closely. The sample can thus be taken to be adequately representative of the population to which we generalize.

Coding and quality control.

The completed interview schedules were turned over to Kohn and Schooler for coding and analysis. The coding operation was a closely coordinated effort of a small group of people. We think that this effort gained enough in accuracy to repay the effort of more than two years of painstaking work that went into coding and quality control.

Kohn and Schooler employed three tests of coding reliability. First, the coding of 400 randomly selected interviews was appraised by the coding supervisor. She agreed with the original coder's classification of each question at least 95 percent of the time. Second was a blind experiment based on a random sample of 50 interviews. Five people independently coded all items on which there was any possibility that knowledge of how another person had coded the material might affect one's judgment. A majority of the five agreed, absolutely, at least 90 percent of the time, on all but two of the classifications made. Since this level of agreement was reached under the special circumstances of everyone knowing that this was a reliability study, they employed a third test. This test compared the consensus of the five coders with the original classification that had been made under altogether routine conditions. Except for two classifications, which are of only minor import for the analyses of this book, there was at least 80 percent absolute agreement on each.

Subsequently, a detailed check was done of all possible inconsistencies in respondents' answers--for example, self-employed men telling us about their supervisors. These were individually checked against the original interviews and corrected. There followed a search for "implausibilities"--things that might be true, but were unlikely to happen often, for example, men being married to considerably older women, or men whose first full-time job started at an improbably early age. These too were checked against the original interview and corrected when necessary. Only then was data-analysis begun.

The 1974 follow-up survey.

The men chosen for the follow-up study were a representative subsample of the 2553 men in the original sample who were less than 56 years old at the time of the initial study and thus would be less than 66 years old when reinterviewed. The primary reason for the age restriction was that a large proportion of the older men would be retired, making them inappropriate for the main thrust of the inquiry--assessing the reciprocal effects of ongoing occupational experience and psychological functioning. Excluding the older men also had the desirable consequence of increasing the proportion of men who had children of the appropriate age-range for the inquiries about parental values.

Wherever a selected man was married at the time of the follow-up interview, NORC attempted to interview his wife. The interviewers also attempted to interview the child about whom the parental-values and child-rearing questions had been asked in the 1964 survey. The intent was to try to interview these "children"--now aged thirteen through twenty-five--wherever they lived, wherever they were in their educational, occupational, and family careers.

Pre-test procedures were much the same as for the original study, as were the procedures followed in fieldwork. The only major difference in procedures between the original and follow-up studies was that, for the follow-up study, NORC coded the data. But Kohn and Schooler who used much the same procedures for assuring quality control as they had used in processing the baseline data.

Generalizability of the subsample.

NORC succeeded in interviewing 78 percent of the men who had been randomly selected for the follow-up study, 687 men in all. Kohn and Schooler assessed the generalizability of this subsample by two types of analysis.

The first involved systematic comparison of the social and psychological characteristics of the men who were re-interviewed to the characteristics of the men who were randomly excluded from the follow-up study, who constitute a representative subsample of the overall sample and thus are an appropriate comparison group. The differences between the two subsamples are few and small in magnitude: Judging from the original interviews, the men who were re-interviewed were a little more intellectually flexible, somewhat more trustful, slightly lower in self-confidence, and they had been reared in somewhat more "liberal" religious denominations than the men in the comparison group. But the two groups do not differ significantly in the social characteristics most important to our analyses--education, social-stratification position, major occupational characteristics, and age--nor even in urbanness.

The second method of assessing the representativeness of the follow-up sample was to repeat the major substantive analyses that had previously been done with the cross-sectional 1964 data, this time limiting the analyses to those men who had been reinterviewed in 1974. Kohn and Schooler repeated all their principal analyses of the relationships among social stratification, occupational conditions, and psychological functioning. The smaller size of the subsample meant that several secondary avenues could not be explored and that some of the earlier findings were no longer statistically significant. But the main findings held up uniformly well. Thus, we can proceed to analyze the longitudinal data with confidence that whatever we find can be generalized to the larger population of employed men in the United States at that time.

The Polish Surveys

The Polish surveys were conducted under the auspices and with the financial support of the Polish Academy of Sciences. The initial Polish survey was planned by Slomczynski, Krystyna Janicka, and Jadwiga Koralewicz-Zebik; the survey of mothers and children was planned by Slomczynski and Anna Zawadzka.

Sample selection.

The initial Polish survey, conducted in 1978, was designed to represent men, aged 1965, living in urban areas and employed fulltime in civilian occupations. The size of this population was approximately 6.2 million in 1978, out of a total Polish population of approximately 35 million. Although the rural peasantry (for whom there is no counterpart in the United States) is not represented, farmers living in proximity to urban centers are included, making the Polish sample more comparable to the U.S. sample than a sample fully representative of Poland would be.

A three-stage probability sampling scheme was devised. In the first stage, all urban centers were listed and given weights proportional to the population size of these units. Twenty six urban centers, representing 30% of the urban male population of Poland, were selected. In the second stage, all electoral districts of the selected urban centers were pooled and a sample of 48 districts was randomly selected. In the third stage, the official register of voters was screened for sex and age to provide a final list of potential respondents. From each district, two samples of males, aged 19-65, were randomly selected: a basic sample and an auxiliary sample. The basic sample was the target sample of the approximate size intended for the final sample. It was known, however, that the basic sample would contain cases that would not satisfy the criterion of full-time employment in a civilian occupation. Replacements for such persons, as well as for those who could not be interviewed for other reasons, were obtained from the auxiliary sample.

Interviewers from the survey research staff of the Polish Academy of Sciences secured interviews with 875 of the men in the basic sample. Another 442 of the men in that sample did not satisfy the sampling criteria; 114 could not be reached; and 28 refused to be interviewed. The auxiliary sample provided an additional 682 interviews, for a total of 1557. Two random subsamples were selected from among those interviewed. The first (N=400) was for purposes of psychological testing, to be explained below. The second (N=752) was selected for a verificational study and reliability assessment of some of the measures of social stratification and of psychological functioning, which will also be explained below.

As in the U.S. baseline survey, each father who had at least one child aged 315 living at home was asked about his values visavis a randomly selected child. In 19791980, interviews were conducted with those children who were then 1317 years old (N=177) and with their mothers. We shall discuss these data further in Chapter 7.

Design of interview schedule.

The interview schedule used in the study of Polish men was designed to be an exact replication of the main parts of the U.S. interview schedule, including all pertinent questions about class, stratification, occupational self-direction, and psychological functioning. Unfortunately, the Polish interview schedule does not contain information about job conditions other than those directly involved in occupational self-direction--an omission that will haunt us throughout the analyses that follow. The interview schedule does, however, include additional questions specifically pertinent to Poland. One that will be of particular interest for our analyses is whether or not the respondent was a member of the Polish United Workers (Communist) Party.

The interview schedule used for wives was essentially an abridgement of that used for the men, including basic data about social stratification, but little about actual job conditions. The lack of detailed information about the wives' job conditions precludes our doing analyses about the role of occupational self-direction in explaining the psychological effects of class and stratification parallel to those we shall do for men. The data are appropriate, though, for analyses of the transmission of values in the family. The interview schedule for the offspring focuses on values, thus providing the remaining information crucial for our analyses of the transmission of values in the family.

Questions pertaining to values, cognitive functioning, and orientations to self and society, as well as questions about occupational selfdirection, were directly adopted from the U.S. interview schedules. The measures of primary dimensions of social stratificationformal education, job income, and occupational status (prestige)came from previous Polish studies (see Danilowicz and Sztabinski 1977; Slomczynski and Kacprowicz 1979), where they had been intensively tested.

Assuring cross-national comparability of meaning.

The initial translation of each of the questions adopted from the U.S. surveys was prepared by Slomczynski during a six-week stay in 1976 at the Laboratory of Socio-Environmental Studies of the National Institute of Mental Health. In preparing that translation, Slomczynski discussed each of the questions thoroughly with Kohn and his colleagues, to be certain that the translation captured the intended meaning of the questions. Those discussions were especially valuable for translating questions that use colloquial expressions, such as "going to pieces," "end up causing trouble," and "feel upset," which have several equivalents in Polish.

Another difficulty was finding equivalents for situations that do not arise in Poland. For example, American respondents were asked: "Suppose you wanted to open a hamburger stand and there were two locations available. What questions would you consider, in deciding which of the two locations offers a better business opportunity?" In Poland there are no hamburger stands, but certain types of newsstands (kiosks) are widespread. Slomczynski changed the question to: "Suppose for a moment that you have to decide where a newsstand (kiosk) should be located in a new apartment development. There are two possible locations and you can choose one of them. What questions would you consider in deciding which of the two locations is better?" As we shall see, the Polish version of the question served quite well as an indicator of intellectual flexibility.

In further work, several versions of each translated question were prepared by graduate students of the English Philology Department at the University of Warsaw. The method used for constructing those alternate versions was to substitute single words or entire phrases in appropriate places in the question. The alternative versions of the question were then judged collectively by a group of linguistic experts, who evaluated the semantic and syntactic equivalence of the Polish and English versions.

The resulting interview schedule was subjected to a pilot study, based on interviews with fifty persons selected from the upper and lower ends of the educational and occupational distributions. The interviewers were experienced research workers with graduate training in sociology. They were further trained to conduct the interviews and, simultaneously, to take notes on the respondent's reactions to each question.

The questions were then assessed by two criteria: what proportion of the respondents needed to have a question repeated or explained; and what proportion of the respondents gave answers that did not fit the coding scheme. For example, the agree-disagree question, "Young people should not be allowed to read books that are likely to confuse them," is based on the assumption that such books exist. A respondent may not share this assumption. Or a respondent might answer within a different conceptual framework from the one intended by the question. For example, the question "How often do you feel depressed?" requires one of the following responses: always, frequently, sometimes, rarely, or never. Instead of using these categories, a respondent might answer, "I feel depressed whenever something tragic happens in my life". Formally incorrect answers might also result from a lack of knowledge on the part of the respondent, as evidenced by such responses as "I do not know how to answer;" "It is difficult for me to decide," "I have no opinion," and similar responses. Any question that 20% of the respondents seemed to find unclear was modified. Any question to which 5% of the respondents did not give a formally correct answer was modified. About 10% of the questions that had been translated from English were modified at this stage.

The version of the interview schedule containing these modifications was twice tested on university students, discussed by experienced interviewers, and modified.

Training of interviewers and administration of the survey.

The Institute of Philosophy and Sociology of the Polish Academy of Sciences had a pool of 300 trained interviewers distributed around the country in proportion to the population size of each region and district. From this pool, 159 interviewers were selected, primarily from locations included in the sample. Interviewers were divided into 15 groups of 10 or 11 persons. An interviewing instructor conducted an intensive training session of each group. After the session, interviewers were examined to assess their familiarity with the questionnaire. The instructor also evaluated trial interviews conducted by each interviewer with two persons who were not members of the actual sample.

The actual interviews were conducted at the respondents' homes, with interviewers conducting no more than two interviews a day. Completed interview schedules were submitted to the supervisor, who checked them for completeness and for formal correctness of answers to the questions. Where there were omissions or evident errors, the interviewer was asked to make another visit. This day-by-day quality control resulted in there being only a small amount of missing data.

The Academy's regular interviewers asked the respondents all the questions except those employing semi-projective tests, two of which--the Embedded Figures Test and the Draw-a-Person Test--are used in our index of intellectual flexibility. The Polish Psychological Association advised that administration of these tests be done by specially trained interviewers. Since there were only forty such interviewers available, the two tests were given to only a subsample of the respondents, a representative 400 of the 1557 respondents. Although we feared that having this information for only a subsample of the respondents might seriously hamper the analyses, it turned out to pose only minor difficulties.

The interview record includes information about the interviewer's sex, age, education, and field experience. These data were used to investigate "interviewer effects," which proved to be insubstantial (see, in Polish, Slomczynski and Kohn, pp. 220-222).

Coding of the data and quality control.

All interview responses were transcribed onto coding sheets. Ten percent of the coding of "closed" questions was checked by independent coding. The "open" questions were coded independently by two experienced groups. The results were checked by a third group for accuracy of coding and intercoder consistency. Generally, the inter-coder consistency was relatively high; in cases of discrepancy, the final decision was made by the coding supervisor, who was in frequent contact with the principal investigators.

The coding of occupational position was done by a highly trained group of coders. They used the "Social Classification of Occupations" (Pohoski and Slomczynski 1978), which consists of 240 occupational categories. This classification system requires detailed information about the respondent's occupation, position in the employing organization, and type of enterprise in which he is employed. Where interview schedules did not provide sufficiently detailed information, the coders had to make inferences about respondents' occupational categories on the basis of additional material such as dictionaries of occupations. Coders recorded whether the coding of occupation was made solely on the basis of information provided by the respondent or required additional inference. Analysis of these data shows that, with respondent's age, education, and occupational position statistically controlled, the average socio-economic status of those respondents for whom such inferences had to be made does not differ from that of respondents for whom it was not necessary to make such inferences.

Coded data were transcribed onto computer tape and tested for "logical consistency". Among these tests were: making certain that respondents were not classified as having answered questions that it would have been inappropriate for them to have been asked; evaluating pairs of responses that are unlikely to occur together--for example, having a child and not ever having been married; examining the sequence of dates of various life events, such as leaving school, starting one's first job, joining certain organizations, and age at birth of first child. Suspect data were checked against original interview schedules and corrected when found actually to be in error. Where there was obvious and non-arguable error (whether by respondent or interviewer), the data were recoded as "missing". Less than three-tenths of one percent of all information had to be corrected.

Verification study.

Three months after the completion of datagathering, a verification study was conducted by mail. A sample of 752 respondents, randomly selected from among those respondents who had at least some high school education, were sent a letter and a questionnaire by the Institute of Philosophy and Sociology of the Polish Academy of Sciences. The letter explained the purpose of the second interview and asked the respondent to fill out and return the brief questionnaire. The questionnaire contained four questions about the interviewer's performance and eleven substantive questions repeated from the interview schedule.

The results confirm the high quality of the data. All respondents indicated that the interviewer consistently took notes during the interview; no information was obtained that suggested that the behavior of any of the interviewers was in any way inappropriate or improper. The questionnaire contained questions about the respondent's education, occupation, and income--the components for our indices of social-stratification position. It also included four of the questions used in indexing one of our principal dimensions of psychological functioning, authoritarian-conservatism. We shall use this information in subsequent chapters as part of our assessment of the reliability of measurement.

CONCLUSION

We began this chapter with a discussion of the pivotal importance of comparability of meaning in cross-national research. This issue will be with us throughout this book, particularly as we develop indices of social structure and of psychological functioning. In this chapter, our concern has been the quality of the data. For any analysis of social structure and personality, it is essential that the quality of the data be high. For valid cross-national analysis, the requirements are even more stringent: It is not only essential that the data for both countries be of high quality; it is also essential that the procedures used for collecting and processing those data be similar; and, perhaps most important of all, it is essential that considerable care be given to all the nuances of language and culture that can make similar-seeming questions actually have quite different meaning.

We believe that the data of the U.S. and Polish studies provide materials of unusually high quality for our intended analyses. Moreover, we enter into the cross-national analyses of these data with confidence that the extraordinary efforts taken by the Polish Academy of Science have done as much to assure true comparability of meaning as is possible with survey methods. In succeeding chapters, we shall endeavor to do as well in our development of indices and of causal models.