Journal:Judgements of research co-created by generative AI: Experimental evidence

From LIMSWiki
Revision as of 23:32, 29 February 2024 by Shawndouglas (talk | contribs) (Created stub. Saving and adding more.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Full article title Judgements of research co-created by generative AI: Experimental evidence
Journal Economics and Business Review
Author(s) Niszczota, Paweł; Conway, Paul
Author affiliation(s) Poznań University of Economics and Business, University of Southampton
Primary contact Email: pawel dot niszczota at ue dot poznan dot pl
Year published 2023
Volume and issue 9(2)
Page(s) 101–114
DOI 10.18559/ebr.2023.2.744
ISSN 2450-0097
Distribution license Creative Commons Attribution 4.0 International
Website https://journals.ue.poznan.pl/ebr/article/view/744
Download https://journals.ue.poznan.pl/ebr/article/view/744/569 (PDF)

Abstract

The introduction of ChatGPT has fuelled a public debate on the appropriateness of using generative artificial intelligence (AI) (large language models; LLMs) in work, including a debate on how they might be used (and abused) by researchers. In the current work, we test whether delegating parts of the research process to LLMs leads people to distrust researchers and devalues their scientific work. Participants (N = 402) considered a researcher who delegates elements of the research process to a PhD student or LLM and rated three aspects of such delegation. Firstly, they rated whether it is morally appropriate to do so. Secondly, they judged whether—after deciding to delegate the research process—they would trust the scientist (who decided to delegate) to oversee future projects. Thirdly, they rated the expected accuracy and quality of the output from the delegated research process. Our results show that people judged delegating to an LLM as less morally acceptable than delegating to a human (d = –0.78). Delegation to an LLM also decreased trust to oversee future research projects (d = –0.80), and people thought the results would be less accurate and of lower quality (d = –0.85). We discuss how this devaluation might transfer into the underreporting of generative AI use.

Keywords: trust in science, metascience, ChatGPT, GPT, large language models, generative AI, experiment

Introduction

Acknowledgements

Funding

This research was supported by grant 2021/42/E/HS4/00289 from the National Science Centre, Poland.

Conflict of interest

None stated.

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The original lists references in alphabetical order; this version lists them in order of appearance, by design.