Guest Post — Git, GitHub, and You: How Collaborative Writing Tools Propel Open Science

One voice can make a stunning melody, but a chorus of voices weaves soul-stirring harmonies otherwise impossible to achieve. In the same way, collaboration transforms innovative scientific ideas into meaningful, reproducible research.

NASA and the White House declared 2023 the Year of Open Science, launching Transform to Open Science (TOPS) to make scientific research more accessible, transparent, and reproducible. As part of this initiative, NASA has awarded my organization, Don’t Use This Code (DUTC), a grant in a collective effort to train 20,000 scholars through free virtual cohorts. (Register for the training here!)

So, who am I? Where do I fit into this conversation? I’m a technical writer: I craft and refine documentation alongside the tech wizards on my team, passing pieces back and forth until the final product is just right. While I’ve ghostwritten user guides for a university, produced manuals for nuclear power plant equipment, and edited more Python training materials than I can count, when DUTC received this grant, I knew I had a lot to learn about open science. However, I soon realized that collaborating with research isn’t all that different from collaborating with words.

Why Collaborate?

There’s a lab-coated elephant in the room: why bother collaborating? 

> I want to be the one to receive credit for my hard work. I put in the time, effort, and brainpower, so I should walk away with the recognition that I deserve.

This is absolutely true! Research is a rewarding but arduous process, and you deserve appreciation for your accomplishments. Simultaneously, open science is not the opposite of individual achievement; transparency is necessary to further humanity’s understanding of the world, and there are tools out there that enable you to get credit while you collaborate. We’ll get into those tools soon.

First, we’re going to take a look at two examples of how open science collaboration did—or, in the first case, would have—benefit(ed) research and public policy.

Climate in the 90s

Let’s take a walk through the end of the 20th century. To set the scene, it’s 1990. The Hubble telescope launched, and the Human Genome Project kicked off. In the same year, R.W. Spencer and J.R. Christy published a study titled, “Precise monitoring of global temperature trends from satellites.” This study claimed there was no evidence of “tropospheric warming, despite apparently rapid surface warming.” Moreover, while available, the data they collected were difficult to access and the software they used to analyze that data was not released at all. The lack of transparent sharing made the results incredibly difficult to reproduce, and therefore their findings could not be comprehensively challenged.

Fast forward to 1998: one month after Google was founded, T.C. Peterson et al. became one of many to dispute the 1990 study in a publication titled, “First difference method: Maximizing station density for the calculation of long-term global temperature change.” In this study, they found that the previous approach had a major flaw: they didn’t account for effects like orbital decay during their data analysis. This kicked off a frenzy to try and redo the analysis, which would be a tedious and expensive process.

Finally, we made it to 2003, around the time the Human Genome Project reached completion. J.R. Lanzante et al. published, “Temporal Homogenization of Monthly Radiosonde Temperature Data” (parts 1 and 2). In this study, they looked over the temperature data using updated methods, ultimately concluding that the study from the 1990s was incorrect. 

We began this journey with some unreproducible results in 1990 and finally isolated an oversight and correction in 2003. That’s thirteen years. Thirteen years. 

The original findings affected public policy and understanding for over a decade, doing potentially irreparable damage to societal perception as it was left unchecked.

Compare this example to the second:

Modern Health Science Research

Medical research is one of many fields that are turning to collaboration tools like GitHub to share their findings. In 2019, Zhengru Shen and Marco Spruit released “A Systematic Review of Open Source Clinical Software on GitHub for Improving Software Reuse in Smart Healthcare,” which dove into the research hosted on GitHub using open source clinical software from 2008–2018. In this study, they found 14,971 clinical-related GitHub repositories. 

Since then, the use of GitHub and other collaboration tools for clinical research has only grown, sparked in part by global COVID-19 research efforts. In a paper giving insights on this topic, Lonni Besançon et al. described the challenges faced during the pandemic. They stated, “As COVID-19 was a new disease, there was no standardized diagnostic criteria or clinical outcomes. This led to a multiplication of different outcomes studies in the articles participating in the difficulty to replicate and compare results.” 

Everyone in the world wanted a solution to the crisis, and fast—faster than a typical research cycle would allow. Researchers, scientists, and medical personnel were trying to crack the secrets of the virus, and many individuals’ work bumped into the same findings and data as others. The result, Besançon and colleagues described, was a surplus of “research waste”: rather than collaborating with one another and communicating on active work, duplicates of the same studies would emerge, wasting critical time and resources.

To launch over the hurdle of simultaneous efforts, researchers adopted Open Methodology, a facet of open science that focuses on standardizing reporting, definitions, and documentation, bridging the gap between different disciplines and countries. Through transparency and collaboration, great leaps were made toward resolving the crisis.

How Can I Collaborate?

There are ample collaboration tools available, but I’m going to focus on Git and GitHub, as they integrate with OSF.

Git is a version control system that allows you to track changes to code and files, while GitHub is a cloud-based platform that uses Git. You can use GitHub to collaborate, host repositories, and share and manage your work out in the open.

Like a Google Drive, a GitHub repository is a collection of files and folders hosted online. It allows you to track historical changes to all files within your repository, enabling you to see revision history and even “time travel” back to previous versions of the files. Best of all, repositories can be made publicly accessible and searchable, so we can readily collaborate with our colleagues and receive feedback from anyone else following along with our work.

Let’s look at the basics of managing a GitHub repository:

  1. Create the repository: You can add files containing code, text, and/or data to a repository. In our TOPS training sessions, we always recommend including a README page to explain everything a future reader, user, or fellow researcher would need to know to make use of your repository.
  2. Fork: Once your repository is out in the open, you or other developers and researchers can create a “fork,” which is your own personal copy of the repository that can be poked and prodded without altering the original project as it stands.
  3. Commit: A commit is a snapshot or checkpoint of your code that you can refer back to as you work. This is especially useful if something goes wrong along the way and you need to rewind to a known working state!
  4. Pull Request: A pull request is a proposal to the repository manager asking to combine changes from your fork back into the original repository.
  5. Merge: This is the last step of our collaborative process in which the owner of the original repository accepts the Pull Request. The proposed changes contained within the Pull Request are combined into the original repository.

While there’s a lot more to learn about GitHub, this simple walkthrough is a broad overview of the important concepts at play when collaborating on GitHub.

How to Use the GitHub and GitLab Add-ons for the OSF

The OSF offers add-ons for two Git-based services: GitHub and GitLab. If you would like to connect a GitHub or GitLab repository to your project, first navigate to your project and select the Add-ons tab in the navigation bar. From there, you can enable GitHub and/or GitLab and configure them. For more detailed information on how to connect your accounts, check out our help guides for GitHub and GitLab.

If there is another Git-based service you think would be useful to have connected to the OSF, please send your suggestion to support@osf.io.

Wrap-Up: Better Together

Scientific innovation is not a solo performance, but a song full of complementary voices. Through collaboration, we crescendo the music of humanity. Using tools like GitHub to conduct the moving pieces of your research will empower you to improve the quality of your—and your colleagues’—work, while getting the recognition you deserve for your contributions.

If you want to learn more about open science and the tools available to you, register for the free NASA TOPS training! You’ll get to learn from and network with experts, researchers, and scientists (and me!) as we do what we can to make science more open and accessible to all. 

For those who are interested in providing training to their institutions, the Center for Open Science also provides training on a variety of topics, including a primer on open scholarship, data management, preregistration and Registered Reports, and more.


About the author

Untitled design - 2024-10-07T161419.074Brooklyn Olson is a Technical Writer with a B.A. in English (specializing in Professional Writing) and a certificate in Computer Information Technology. She currently works for Don't Use This Code, supporting NASA’s Transform to Open Science initiative by developing educational and procedural materials for corporate trainees. Previously, she served as a technical writer for Curtiss-Wright’s Nuclear Division and as a ghostwriter for Brigham Young University–Idaho.

Recent Posts