Replication Package for "TGIF: The Evolution of Developer Commits Over Time"

This repository contains the replication package for the research paper "TGIF: The Evolution of Developer Commits Over Time". Follow the steps outlined below to reproduce the results presented in the study.

Collect Initial Projects

Visit the SEART GitHub Search.
Apply the following filters:
- Number of Commits: Minimum = 12,730
- Number of Stars: Minimum = 10
- Number of Forks: Minimum = 10
- Number of Contributors: Minimum = 10
Download the search results as a CSV file.

Sampling

Navigate to the sampling directory.
Run repo_sampler.py to sample the repositories.
Fetch the projects by running fetch-projects.sh.

Data Cleaning & Writing Data to CSV Files

Data Cleaning

Return to the base directory and then navigate to the data-cleaning/find-duplicates directory.
Run find_duplicates.py to identify duplicate projects.
Manually inspect the identified duplicates and remove one of each duplicate from the accepted projects text file.

Assess Timezone Reliability

Return to the base directory and then navigate to the data-cleaning/timezone-reliability-assessment directory.
Run count_timezone_commits.bash for every desired year to calculate the number of commits per timezone.
Run early_year_variations.py to calculate variation metrics.

Write Data to CSV

Return to the base directory and then navigate to the write-data-in-csv directory.
Generate commit counts and proportions per day by running commit_count_per_day.py.
Generate commit counts and proportions per hour by running commit_count_per_hour.py.

Data Cleaning After Generating the CSVs

Return to the base directory and then navigate to the data-cleaning/2013-spike-analysis directory.
Run rejected-mariadb-commits.bash to find the commits that must be removed.
Manualy remove those commits from the 4 CSV files at the write-data-in-csv/csv-files directory.

Statistical Analysis & Plots

Return to the base directory and then navigate to the statistical-analysis directory.
Run the desired scripts to perform the statistical analysis.
Return to the base directory and then navigate to the plots directory.
Run the desired scripts to generate the plots.

Discussion

Work-Life Balance Analysis

Visit Scopus.
Perform the first search:
- Search within: language - Search Documents: English
- Click 'Analyze results'
- Select year range to analyze: 2000 to 2023
- Click 'Export' -> 'Export the data to a CSV file'-> 'Export'
Perform the second search:
- Search within: Article title, Abstract, Keywords - Search Documents: "wellness" OR "well-being" OR "work-life balance" and within: language - Search Documents: English
- Click 'Analyze results'
- Select year range to analyze: 2000 to 2023
- Click 'Export' -> 'Export the data to a CSV file'-> 'Export'
In the discussion/work-life-balance-approach directory, run publication-analysis.py.

Number of Contributors Analysis

Navigate to the discussion/contributors-number directory.
Run all_contributors_per_year.bash.
Run contributors_plot.py.

Average Commit Lines per Year Analysis

Navigate to the discussion/avg-commit-lines-per-year-analysis directory.
Run lines_per_commit.bash.
Run lines_per_commit_plot.py.

FreeBSD Demographics Analysis

Navigate to the discussion/FreeBSD-demographics-analysis directory and run the scripts.

Distribution of Programming Languages

To analyze the distribution of programming languages in the sampled projects:

Navigate to the distribution-of-languages directory.
Run find_distribution.py to find the distribution.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
commits_per_timezone		commits_per_timezone
data-cleaning		data-cleaning
discussion		discussion
distribution-of-languages		distribution-of-languages
paid_vs_volunteering		paid_vs_volunteering
plots		plots
sampling		sampling
statistical-analysis		statistical-analysis
write-data-in-csv		write-data-in-csv
.gitignore		.gitignore
README.md		README.md
create-zip.sh		create-zip.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Replication Package for "TGIF: The Evolution of Developer Commits Over Time"

Table of Contents

Collect Initial Projects

Sampling

Data Cleaning & Writing Data to CSV Files

Data Cleaning

Assess Timezone Reliability

Write Data to CSV

Data Cleaning After Generating the CSVs

Statistical Analysis & Plots

Discussion

Work-Life Balance Analysis

Number of Contributors Analysis

Average Commit Lines per Year Analysis

FreeBSD Demographics Analysis

Distribution of Programming Languages

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

vtalos/commit-patterns-replication-package

Folders and files

Latest commit

History

Repository files navigation

Replication Package for "TGIF: The Evolution of Developer Commits Over Time"

Table of Contents

Collect Initial Projects

Sampling

Data Cleaning & Writing Data to CSV Files

Data Cleaning

Assess Timezone Reliability

Write Data to CSV

Data Cleaning After Generating the CSVs

Statistical Analysis & Plots

Discussion

Work-Life Balance Analysis

Number of Contributors Analysis

Average Commit Lines per Year Analysis

FreeBSD Demographics Analysis

Distribution of Programming Languages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages