Skip to content

This repository contains the replication package for the research paper 'TGIF: The Evolution of Developer Work Times'

Notifications You must be signed in to change notification settings

vtalos/commit-patterns-replication-package

Repository files navigation

Replication Package for "TGIF: The Evolution of Developer Commits Over Time"

This repository contains the replication package for the research paper "TGIF: The Evolution of Developer Commits Over Time". Follow the steps outlined below to reproduce the results presented in the study.

Table of Contents

  1. Collect Initial Projects
  2. Sampling
  3. Data Cleaning & Writing Data to CSV Files
  4. Statistical Analysis & Plots
  5. Discussion
  6. Distribution of Programming Languages

Collect Initial Projects

  1. Visit the SEART GitHub Search.
  2. Apply the following filters:
    • Number of Commits: Minimum = 12,730
    • Number of Stars: Minimum = 10
    • Number of Forks: Minimum = 10
    • Number of Contributors: Minimum = 10
  3. Download the search results as a CSV file.

Sampling

  1. Navigate to the sampling directory.
  2. Run repo_sampler.py to sample the repositories.
  3. Fetch the projects by running fetch-projects.sh.

Data Cleaning & Writing Data to CSV Files

Data Cleaning

  1. Return to the base directory and then navigate to the data-cleaning/find-duplicates directory.
  2. Run find_duplicates.py to identify duplicate projects.
  3. Manually inspect the identified duplicates and remove one of each duplicate from the accepted projects text file.

Assess Timezone Reliability

  1. Return to the base directory and then navigate to the data-cleaning/timezone-reliability-assessment directory.
  2. Run count_timezone_commits.bash for every desired year to calculate the number of commits per timezone.
  3. Run early_year_variations.py to calculate variation metrics.

Write Data to CSV

  1. Return to the base directory and then navigate to the write-data-in-csv directory.
  2. Generate commit counts and proportions per day by running commit_count_per_day.py.
  3. Generate commit counts and proportions per hour by running commit_count_per_hour.py.

Data Cleaning After Generating the CSVs

  1. Return to the base directory and then navigate to the data-cleaning/2013-spike-analysis directory.
  2. Run rejected-mariadb-commits.bash to find the commits that must be removed.
  3. Manualy remove those commits from the 4 CSV files at the write-data-in-csv/csv-files directory.

Statistical Analysis & Plots

  1. Return to the base directory and then navigate to the statistical-analysis directory.
  2. Run the desired scripts to perform the statistical analysis.
  3. Return to the base directory and then navigate to the plots directory.
  4. Run the desired scripts to generate the plots.

Discussion

Work-Life Balance Analysis

  1. Visit Scopus.

  2. Perform the first search:

    • Search within: language - Search Documents: English
    • Click 'Analyze results'
    • Select year range to analyze: 2000 to 2023
    • Click 'Export' -> 'Export the data to a CSV file'-> 'Export'
  3. Perform the second search:

    • Search within: Article title, Abstract, Keywords - Search Documents: "wellness" OR "well-being" OR "work-life balance" and within: language - Search Documents: English
    • Click 'Analyze results'
    • Select year range to analyze: 2000 to 2023
    • Click 'Export' -> 'Export the data to a CSV file'-> 'Export'
  4. In the discussion/work-life-balance-approach directory, run publication-analysis.py.

Number of Contributors Analysis

  1. Navigate to the discussion/contributors-number directory.
  2. Run all_contributors_per_year.bash.
  3. Run contributors_plot.py.

Average Commit Lines per Year Analysis

  1. Navigate to the discussion/avg-commit-lines-per-year-analysis directory.
  2. Run lines_per_commit.bash.
  3. Run lines_per_commit_plot.py.

FreeBSD Demographics Analysis

  1. Navigate to the discussion/FreeBSD-demographics-analysis directory and run the scripts.

Distribution of Programming Languages

To analyze the distribution of programming languages in the sampled projects:

  1. Navigate to the distribution-of-languages directory.
  2. Run find_distribution.py to find the distribution.

About

This repository contains the replication package for the research paper 'TGIF: The Evolution of Developer Work Times'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •