The Publishing Research Consortium has just published an interesting survey of publishers’ responses to the issue of data mining. A research study into Practices, Policies, Plans…..and Promises written by Eefke Smit and Maurits van der Graaf reports on a survey they undertook of 29 experts in publishing and data mining.
They conclude that more standardization is needed to help make data mining more scalable. There need to be common platforms and agreed rules of use for the data. Here is an extract from the Executive Summary:
Overall, experts expect a further acceleration of text and data mining into scholarly content, sparked by a greater availability of digital content corpuses, the ever increasing computer capabilities, improved user‐friendliness of software tools and easier access to content. Semantic annotation of content is expected by some to develop into a new standard for STM content, facilitating better and deeper search and browse facilities into related articles ‐‐ even if use cases and business propositions are at present in infancy stage only and not yet fully developed.
This optimism on Journal Article Mining is generally shared by publishers across the board who expect an increase in publishers mining their own content. Half of the publishers surveyed also already see an increase in mining requests from third parties. The mining requests that publishers receive are not very frequent (mostly less than 10 per year, a good share even less than 5 per year) and come mostly from Abstracting and Indexing services and from corporate customers. Respondents also note a fair amount of illegal crawling and downloading that suggest unreported mining activities.
Publishers tend to treat mining requests from third parties in a liberal way, certainly so for mining requests with a research purpose. Publishers are less permissive if the mining results can replace or compete with the original content. Few publishers have a publicly available mining policy, the large majority handles mining requests on a case‐by‐case basis. Approximately 30 % of publisher respondents allow any kind of mining of their content without restrictions, in most cases as part of their Open Access policies. For the other publishers, nearly all require information about the intent and purpose of the mining request.
Data curation on journal sites on behalf of authors is more common in STM fields but there may be more demand in the social sciences as funding bodies begin to demand data management plans as part of their grants.