The Data-PASS Project: Preserving the Past of Survey Research
By David L. Weakliem
The rapid pace of technological change means that much information being produced today can become unusable in a short period of time. Reading a twenty-year-old book is just as easy as reading a new book, but reading a twenty-year-old computer tape can be a major challenge. As time goes on, more and more information is "born digital" in forms such as web pages, data files, and email.
The Data-PASS project is one of seven projects funded by the Library of Congress with the objective of preserving different kinds of digital information. Some of the projects focus on information in such forms as web pages, geospatial data, and public television programs. Data-PASS is devoted mainly to data gathered through survey research.
The Data-PASS (Data Preservation Alliance for the Social Sciences) project is a partnership of the Library of Congress and six archives. Its goal is to locate and archive all digital social science data on American society ever produced. We define "social science data" broadly—not only data gathered by social scientists, but data on any topic that might be of interest to social scientists. While the major focus will be on surveys, we expect to acquire other kinds of data as well. We cannot hope to achieve complete success; much of what has been produced over the years is already lost. But if we act now, a great deal of important data can be preserved for the lasting benefit of social scientists, policymakers, and the general public.
The seven institutions involved in the Data-PASS project are the Library of Congress, the Inter-university Consortium for Political and Social Research at the University of Michigan, the Roper Center for Public Opinion Research at the University of Connecticut, the Odum Institute for Research in the Social Sciences at the University of North Carolina, the
Henry A. Murray Research Archive, a member of the Institute for Quantitative Social Science at Harvard University, the Harvard-MIT Data Center, and the Electronic and Special Media Records Service Division of the National Archives and Records Administration. Each of these institutions has a different focus, and any data recovered will be directed to the most suitable one (though all data will also be shared among the partners).
One question people often ask is whether we really want all data. Isn't there a lot that just isn't worth preserving? Clearly, we have to set some priorities, and we have developed criteria that will enable us to rank datasets as more or less important. Data obtained from random samples, or at least reasonably representative samples, will rank higher than those obtained from convenience samples. "Classic" datasets that have served as the basis of important publications will have a high priority, as will data on relatively neglected topics. Finally, while it will be necessary to make judgments about the general importance of various topics, we do not know what topics future researchers will regard as important. Hence, we will be inclusive, and, when in doubt, obtain and preserve the data and documentation, if only with minimal processing.
|