Although many national and institutional policies are pushing for research data and other material to be made open and shared, this is still very far from common practice in many domains. Consequently, the Research Data Alliance ‘SHAring Rewards and Credit Interest Group’, formed under the leadership of Anne Cambon-Thomsen, who led the ESOF Toulouse initiative, has been formed to explore and articulate this issue with the aim of raising awareness amongst key stakeholders in the research community. As well as a drafting of a White Paper and some key recommendations, SHARC convened a group of experts at the recent ESOF Toulouse event for further discussion and development of our understanding.
Steven Hill is Director of Research at Research England, part of the UKRI (the recently re-formed UK national Research Council body). From his perspective as a national funder and policy maker – with specific responsibility for coordinating the national research evaluation assessment program, the current system isn’t perculating out to the vast majority of researchers although there is widespread agreement that sharing data and resources is important and should be encouraged. Steven suggested that policy-makers need to take a combination of two approaches:
- Providing the structures that allow researchers to share data and receive credit for doing it, assuming that it will be sufficient (also known as ‘if you build it they will come’), and
- Linking the sharing activity to funding requirements.
To illustrate the scale of the issue, research datasets are considered as valid research outputs in the UK national assessment framework and researchers are free to submit datasets as key research outputs. Nonetheless, only 0.01% of submissions to the last REF exercise (held in 2014) were datasets. A major challenge has been to develop a Common Framework for all disciplines, as currently they are at different stages of maturity for sharing and the very definition of ‘data’ itself also varies across different communities. The first step in this process has been the Concordat on Open Research Data which expresses a multi-stakeholder group’s agreement on 9 common principles on data policy. Briefly, the principles touch on data being: as open as possible but as closed as necessary; budgeted for in research planning; embargoed for use by the data collector for a finite period; handled according to legal and ethical codes; valued in its management, well curated, made available alongside associated publications, and the focus of researcher training. Having established these principles, the next step will be to devise policies to implement them and to fully envisage an infrastructure fit for enabling and sustaining the required actions and outputs.
Michaela Th Mayrhofer is a political scientist and historian by training, who now works for the research infrastructure BBMRI as Chief Policy Officer for the Common Service ELSI and leads the GDPR Code of Conduct initiative. She outlined “The Hugging Problem”. Biobanks rely on public support – which incorporates both public funding and public trust. Yet, funders are frustrated by scientists’ current behaviours – they currently share far too seldom. This represents poor value for the huge public investment as science is slowed down and made less effective by such practices. Mayrhofer speculates that the problem partly stems from research careers and success parameters being derived through competitive rather than collaborative processes – which leads to a built-in reluctance to share that is strongest with one’s closest competitors (and potential partners).
Carthage Smith is Director of the OECD Global Science Forum (which brings countries together to consider shared science policy issues). He considered a number of the barriers, and some potential solutions, to sharing research data. Four key issues have arisen:
- Building trust. As Michaela outlined, the competition model of science versus collaboration does not support trust, so there needs to be a shift towards the paradigm whereby data are viewed as a global public good (and not a commodity). Carthage suggested certified international networks – which would ensure trustworthy infrastructures – could help here.
- Relieving the burden (technological challenge). It currently takes far too much time and effort to deposit data (for no reward). There are skills deficits – a lack of data literacy and professional data management skills, and of valuing the data managers in the system. To address this, standards and protocols need to be agreed and universally adopted, program code needs to be included in the sharing activity and funders and policy makers need to get behind their policies and statements by providing long-term funding commitments.
- Motivation/credit and reward. There is a need to re-think how real impact is measured and rewarded. Data management plans and data use agreements could be developed in this context, as could funding calls for data-reuse projects.
- Governance and Brokering. Informed consent – or the lack thereof – can be an excuse but it is also a real issue. So, agreed-upon (and harmonised) solutions need to be adopted. Lack of understanding and guidance on data privacy regulations can inhibit data sharing by scientists; ethics committees often have insufficient competence in new data issues. There is a need to involve patients and lay persons and to open up to citizen science. There is also a critical role for neutral ‘brokering’ organisations and safe havens for sensitive data that facilitate access.
Professor Mark Ferguson is the Director General of Science Foundation Ireland and the Chief Scientific Adviser to the Government of Ireland. He touched on the challenges of open data depending on discipline, for example sequence data are easier than experimental descriptions, about infrastructure, curation and cost issues as well as the upside for reproducibility and cost efficiency – even IP! He highlighted how potential recognition and credit systems could benefit from new technologies, such as blockchain, that may allow the ascription of value as well as enabling the tracking of who is using what data, where and how.
Mark touched on the challenges of evaluating open data. Twenty to 30 % of publications are never downloaded (so could be argued to have no impact). Meanwhile, the most innovative researchers in the world tend to be affilated to the most conservative institutions that typically consider that a US or EU grant should produce 4 publications in Nature. This infers a very narrow definition success that excludes policy, adoption, teaching and many aspects of innovation. Setting up the idea of explicitly translating achievements and goals, he related a case where the research staff of a particular institution were informed that “one patent equals four Nature publications” in terms of achievements. Not surprisingly, this resulted in a spike of patent applications.
Take home messages
We asked each of our speakers for their take home messages. What is the next issue on the horizon to be addressed? Their responses covered a range of high level and direct actions:
Steven: there is a wide diversity of technical literacy that needs to be addressed so that innovations such as Jupyter notebooks, data and code can be incorporated into the scientific knowledge canon.
Michaela: there is a need to prioritise accurate records for the data’s location and link the concept of open data with accountability.
Carthage: drivers to action need to be incorporated into what comes next. These include incentives to publish, an increase in the perceived value of reproducibility and data citation, and the reform of academic assessment and reward systems.
Mark: in order for new behaviours to be acculturised, new metrics and, critically, explicit translation of value to the institution, need to be part of the updated system.
The authors are grateful to our panellists, Steven Hill, Michaela Th Mayrhofer, Carthage Smith and Mark Ferguson for their contributions to the session and their permission to write this article.
Fiona Murphy and Laurence Mabile