Guest post by Velichka Dimitrova, project coordinator of Open Economics at the Open Knowledge Foundation, LSE Impact of Social Sciences blog, originally posted here.
This post is part of a wider collection on Open Access Perspectives in the Humanities and Social Sciences (#HSSOA) and is cross-posted on the Impact of Social Sciences blog of the London School of Economics and Political Science. We will be featuring new posts from the collection each day leading up to the Open Access Futures in the Humanities and Social Sciences conference on the 24th October, with a full electronic version to be made openly available then.
Note: This article gives the views of the author, and not the position of SAGE or SAGE Connection
Delving deeper into discipline-specific perspectives on openness in the social sciences,Velichka Dimitrova looks at the present and future of open data for economics research. By sharing data, economists stand to enhance the visibility and the impact of their research by allowing for greater scrutiny of their research findings and promoting new uses of the diverse datasets.
Closely related to the open access of academic publications is open access to research data. Similar to the way researchers sign off their copyright to journal publishers, authors often sign non-disclosure agreements with firms about the use of their data even where no privacy concerns exist. The process of having to secure the privilege to work with original datasets often means the material underpinning research remains invisible and unverifiable.
Earlier this year, Harvard economists Kenneth Rogoff and Carmen Reinhart came under fire, when students tried to replicate Growth in the Time of Debt. To the embarrassment of the authors, Herndon et al. (2013) found Excel coding errors and methodological flaws calling into question some of the key conclusions of the paper that drove austerity measures around the world in the aftermath of the worst financial and economic crisis since the Great Depression. “In this age of information math errors can lead to disaster”, pointed out Paul Krugman as it finally became clear why fellow economists had not been able to replicate the results of Reinhart and Rogoff. In a rather controversial research field with an absence of causality identification – whether high debt causes low growth or the other way around – making the underlying data and code available has a crucial importance.
Encouraging better reproducibility of economics research is one of the reasons for the recent release of a Statement on the Openness of Data and Code – the Open Economics Principles, brought out by the Open Economics Working Group of the Open Knowledge Foundation. The purpose of the Principles is to provide some basic guidelines on why, how and when data in economic research should be open.
For economic research to be reliable and trusted, it should be possible to scrutinise and reproduce research findings. This is difficult, or impossible, if data and analysis is not made available. Making material openly available reduces to a minimum the barriers for doing reproducible research.
The Open Economics project began last year with the support of the Alfred P. Sloan Foundation, aiming to establish what open data means for economics: to show examples of projects and map the still existing barriers to opening up data and regression code. With the input of an Advisory Panel of twenty senior economists, the project convened different stakeholders – researchers, funders, journal editors, data professionals and students in two international workshops, building an understanding around the value of open data for economics and quantitative social science.
Whilst many initiatives exist in the field of the natural sciences, social scientists and economists have been more reluctant to open up data and code. The data economists work with is often very diverse. More recent empirical work depends on having unique datasets with individuals, households or firms as observation units, as the cross-country regressions using widely available data are now a thing of the past. Access to quality and high-frequency data is often not free and requires significant investment on the researchers’ side. Such data may contain sensitive information or may be subject to confidentiality agreements.
Yet, for a lot of the data underlying empirical research, no issues exist and data should be open by default and licensed with an open license. Recognising that data and code should be made available upon publication, some economics journals have put in place data availability policies. In fact, the availability of raw data related to a paper is not a new issue. In what became to be regarded as the first referee report of an article submitted to Econometrica, Ragner Frisch commented on the work of Henry Schulz in October 1932:
I would also suggest that you include a table giving the raw data you have used. … I think the publishing of the raw data is very important in order to stimulate criticism and control.
Today, some economics journals have put in place a data availability policy. The American Economic Review, which sets the tone for the policy of other journals, requires the authors of accepted empirical papers to provide prior to publication all necessary data and computation necessary for replication and promises to make it available on the AER website. In fact, the majority of the more recent AER articles have their datasets available online.
The role of funders should be also given some attention. Many funders have also established data management and sharing plans where researchers are required to outline their approach to gathering, storing and disseminating their research data. However, many funders have to face the trade-off between giving more research funding and setting aside a pot for supporting the documentation of research. In line with these developments the U.S. government released a policy memorandum, promising specific funding for making federally-funded research freely available to the public, giving specific attention to digital data.
Various tools and platforms exist for economic researchers to share their research data, e.g. projects like DataVerse at Harvard offer online repositories for research data and services like DataCite allow for tracking the use of datasets and giving credit to data producers. There are many potential benefits for sharing data: it enhances the visibility and the impact of one’s research, allows for the scrutiny of research findings, promotes new uses of the data and avoids unnecessary costs for duplicate research.
Velichka Dimitrova is project coordinator of Open Economics at the Open Knowledge Foundation. She is based in London, a graduate of economics (Humboldt Universität zu Berlin) and environmental policy (University of Cambridge) and a fellow of the Heinrich Böll Foundation. She can be found on Twitter at @vndimitrova