Difference between revisions of "Journal:Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
(Saving and adding more.)
Line 28: Line 28:


'''Keywords''': Reproducible research, interactive scientific computing, collaboration, notebook systems, data management
'''Keywords''': Reproducible research, interactive scientific computing, collaboration, notebook systems, data management
==Introduction==
The replicability of psychological research has been questioned increasingly.<ref name="KleinInvest14">{{cite journal |title=Investigating Variation in Replicability: A “Many Labs” Replication Project |journal=Social Psychology |author=Klein, R.A.; Ratliff, K.A.; Vianello, M. et al. |volume=45 |issue=3 |pages=142–52 |year=2014 |doi=10.1027/1864-9335/a000178}}</ref><ref name="PashlerEditors12">{{cite journal |title=Editors' Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence? |journal=Perspectives on Psychological Science |author=Pashler, H.; Wagenmakers, E.J. |volume=7 |issue=6 |pages=528-30 |year=2012 |doi=10.1177/1745691612465253 |pmid=26168108}}</ref><ref name="YongReplication12">{{cite journal |title=Replication studies: Bad copy |journal=Nature |author=Yong, E. |volume=485 |issue=7398 |pages=298-300 |year=2012 |doi=10.1038/485298a |pmid=22596136}}</ref> Reproducing or even understanding research findings requires extensive knowledge about the experimental manipulations and methods used.<ref name="NosekScientific12">{{cite journal |title=Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability |journal=Perspectives on Psychological Science |author=Nosek, B.A.; Spies, J.R.; Motyl, M. |volume=7 |issue=6 |pages=615-31 |year=2012 |doi=10.1177/1745691612459058 |pmid=26168121}}</ref> Unfortunately, many research publications fail in describing the research process in detail, are difficult to understand without background information, or facilitate misinterpretation.<ref name="DonohoRepro09">{{cite journal |title=Reproducible Research in Computational Harmonic Analysis |journal=Computing in Science & Engineering |author=Donoho, D.L.; Maleki, A.; Rahman, I.U. et al. |volume=11 |issue=1 |pages=8-18 |year=2009 |doi=10.1109/MCSE.2009.15}}</ref> Most articles only include very abstract descriptions of data preparation and analysis steps, making it hard for the reader to follow up on. Consequently, reproducing results from psychological journals is practically impossible.<ref name="ShenInteractive14">{{cite journal |title=Interactive notebooks: Sharing the code |journal=Nature |author=Shen, H. |volume=515 |issue=7525 |pages=151–2 |year=2014 |doi=10.1038/515151a |pmid=25373681}}</ref> The scientific community has tried to solve these problems by publishing supplemental information online. This includes raw data as well as detailed descriptions of data preprocessing and analysis steps. Unfortunately, this information is often organized in a confusing way.
That’s why a group of scientists developed Jupyter, a web application based on IPython.<ref name="PerezIPython07">{{cite journal |title=IPython: A System for Interactive Scientific Computing |journal=Computing in Science & Engineering |author=Perez, F.; Granger, B.E. |volume=9 |issue=3 |pages=21–9 |year=2007 |doi=10.1109/MCSE.2007.53}}</ref> Jupyter enables users to create and share notebooks containing text, visualizations, equations, raw data, and code for analyzing and transforming this data. By blending static content like explanatory text and images with dynamic output of calculations and data analysis procedures, the notebooks emphasize the prose-first approach originally introduced by Mathematica Notebooks more than 20 years ago. The entire research process—including ideation, data acquisition, analysis, and interpretation of results—can be documented in a linear, story-like way. Publishing these notebooks alongside or instead of read-only journal articles may enhance both replication of results and collaboration between researchers.
This tutorial is written for readers with no previous experience using Jupyter. It explains how to set up and use Jupyter's notebooks for organizing, performing and documenting data analysis tasks common in psychological research. Jupyter supports more than 90 programming languages, thus enabling you to analyze data using scripts written in Python, R or virtually any other non-proprietary scripting language. However, this article will strictly focus on R. After setting up the system, an exemplary notebook will be created step by step.
==Setting up Jupyter==
Setting up Jupyter on your local computer includes three steps. First, Python needs to be installed, as it is required to run the notebook system. Afterwards, Jupyter is downloaded. Finally, R is installed and configured to work with Jupyter. All three steps are detailed in the following. Since most readers are assumed to work on Microsoft Windows, the explanations are tailored to this operating system. However, Jupyter can also be setup on both Mac OS and Linux, and the steps to perform the installation are nearly identical.
===Step 1: Installing Python===
Download the latest Python 3 installer from Python.org (current version is 3.6.4). When starting the installer, use default settings, but make sure Python is added to your system's path variable (see Figure 1).
[[File:Fig1 SprengholzQuantMethSci2018 14-2.png|600px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 1.''' Python installer</blockquote>
|-
|}
|}
===Step 2: Installing Jupyter===
After Python has been installed, a command window needs to be opened. Press the Win + R keys on your keyboard, type <code>cmd</code> and press Enter. Afterwards, enter the following
line into the command window and press Enter again: <code>pip install jupyter</code>


==References==
==References==
Line 33: Line 63:


==Notes==
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Grammar has been updated to make the content more readable.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists reference alphabetically; this version lists them in order of appearance, by design.


<!--Place all category tags here-->
<!--Place all category tags here-->

Revision as of 19:02, 18 June 2018

Full article title Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system
Journal The Quantitative Methods for Psychology
Author(s) Sprengholz, Phillipp
Author affiliation(s) Friedrich-Schiller-Universität Jena
Editors Cousineau, Denis
Year published 2018
Volume and issue 14(2)
Page(s) 137–46
DOI 10.20982/tqmp.14.2.p137
ISSN 2292-1354
Distribution license Creative Commons Attribution 4.0 International
Website http://www.tqmp.org/RegularArticles/vol14-2/p137/
Download https://www.tqmp.org/RegularArticles/vol14-2/p137/p137.pdf (PDF)

Abstract

The reproduction of findings from psychological research has been proven difficult. Abstract description of the data analysis steps performed by researchers is one of the main reasons why reproducing or even understanding published findings is so difficult. With the introduction of Jupyter Notebook, a new tool for the organization of both static and dynamic information became available. The software allows blending explanatory content like written text or images with code for preprocessing and analyzing scientific data. Thus, Jupyter helps document the whole research process from ideation over data analysis to the interpretation of results. This fosters both collaboration and scientific quality by helping researchers to organize their work. This tutorial is an introduction to Jupyter. It explains how to set up and use the notebook system. While introducing its key features, the advantages of using Jupyter Notebook for psychological research become obvious.

Keywords: Reproducible research, interactive scientific computing, collaboration, notebook systems, data management

Introduction

The replicability of psychological research has been questioned increasingly.[1][2][3] Reproducing or even understanding research findings requires extensive knowledge about the experimental manipulations and methods used.[4] Unfortunately, many research publications fail in describing the research process in detail, are difficult to understand without background information, or facilitate misinterpretation.[5] Most articles only include very abstract descriptions of data preparation and analysis steps, making it hard for the reader to follow up on. Consequently, reproducing results from psychological journals is practically impossible.[6] The scientific community has tried to solve these problems by publishing supplemental information online. This includes raw data as well as detailed descriptions of data preprocessing and analysis steps. Unfortunately, this information is often organized in a confusing way.

That’s why a group of scientists developed Jupyter, a web application based on IPython.[7] Jupyter enables users to create and share notebooks containing text, visualizations, equations, raw data, and code for analyzing and transforming this data. By blending static content like explanatory text and images with dynamic output of calculations and data analysis procedures, the notebooks emphasize the prose-first approach originally introduced by Mathematica Notebooks more than 20 years ago. The entire research process—including ideation, data acquisition, analysis, and interpretation of results—can be documented in a linear, story-like way. Publishing these notebooks alongside or instead of read-only journal articles may enhance both replication of results and collaboration between researchers.

This tutorial is written for readers with no previous experience using Jupyter. It explains how to set up and use Jupyter's notebooks for organizing, performing and documenting data analysis tasks common in psychological research. Jupyter supports more than 90 programming languages, thus enabling you to analyze data using scripts written in Python, R or virtually any other non-proprietary scripting language. However, this article will strictly focus on R. After setting up the system, an exemplary notebook will be created step by step.

Setting up Jupyter

Setting up Jupyter on your local computer includes three steps. First, Python needs to be installed, as it is required to run the notebook system. Afterwards, Jupyter is downloaded. Finally, R is installed and configured to work with Jupyter. All three steps are detailed in the following. Since most readers are assumed to work on Microsoft Windows, the explanations are tailored to this operating system. However, Jupyter can also be setup on both Mac OS and Linux, and the steps to perform the installation are nearly identical.

Step 1: Installing Python

Download the latest Python 3 installer from Python.org (current version is 3.6.4). When starting the installer, use default settings, but make sure Python is added to your system's path variable (see Figure 1).


Fig1 SprengholzQuantMethSci2018 14-2.png

Figure 1. Python installer

Step 2: Installing Jupyter

After Python has been installed, a command window needs to be opened. Press the Win + R keys on your keyboard, type cmd and press Enter. Afterwards, enter the following line into the command window and press Enter again: pip install jupyter


References

  1. Klein, R.A.; Ratliff, K.A.; Vianello, M. et al. (2014). "Investigating Variation in Replicability: A “Many Labs” Replication Project". Social Psychology 45 (3): 142–52. doi:10.1027/1864-9335/a000178. 
  2. Pashler, H.; Wagenmakers, E.J. (2012). "Editors' Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence?". Perspectives on Psychological Science 7 (6): 528-30. doi:10.1177/1745691612465253. PMID 26168108. 
  3. Yong, E. (2012). "Replication studies: Bad copy". Nature 485 (7398): 298-300. doi:10.1038/485298a. PMID 22596136. 
  4. Nosek, B.A.; Spies, J.R.; Motyl, M. (2012). "Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability". Perspectives on Psychological Science 7 (6): 615-31. doi:10.1177/1745691612459058. PMID 26168121. 
  5. Donoho, D.L.; Maleki, A.; Rahman, I.U. et al. (2009). "Reproducible Research in Computational Harmonic Analysis". Computing in Science & Engineering 11 (1): 8-18. doi:10.1109/MCSE.2009.15. 
  6. Shen, H. (2014). "Interactive notebooks: Sharing the code". Nature 515 (7525): 151–2. doi:10.1038/515151a. PMID 25373681. 
  7. Perez, F.; Granger, B.E. (2007). "IPython: A System for Interactive Scientific Computing". Computing in Science & Engineering 9 (3): 21–9. doi:10.1109/MCSE.2007.53. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists reference alphabetically; this version lists them in order of appearance, by design.