This repository contains the replication package for the research paper "TGIF: The Evolution of Developer Commits Over Time". Follow the steps outlined below to reproduce the results presented in the study.
- Collect Initial Projects
- Sampling
- Data Cleaning & Writing Data to CSV Files
- Statistical Analysis & Plots
- Discussion
- Distribution of Programming Languages
- Visit the SEART GitHub Search.
- Apply the following filters:
- Number of Commits: Minimum = 12,730
- Number of Stars: Minimum = 10
- Number of Forks: Minimum = 10
- Number of Contributors: Minimum = 10
- Download the search results as a CSV file.
- Navigate to the
sampling
directory. - Run
repo_sampler.py
to sample the repositories. - Fetch the projects by running
fetch-projects.sh
.
- Return to the base directory and then navigate to the
data-cleaning/find-duplicates
directory. - Run
find_duplicates.py
to identify duplicate projects. - Manually inspect the identified duplicates and remove one of each duplicate from the accepted projects text file.
- Return to the base directory and then navigate to the
data-cleaning/timezone-reliability-assessment
directory. - Run
count_timezone_commits.bash
for every desired year to calculate the number of commits per timezone. - Run
early_year_variations.py
to calculate variation metrics.
- Return to the base directory and then navigate to the
write-data-in-csv
directory. - Generate commit counts and proportions per day by running
commit_count_per_day.py
. - Generate commit counts and proportions per hour by running
commit_count_per_hour.py
.
- Return to the base directory and then navigate to the
data-cleaning/2013-spike-analysis
directory. - Run
rejected-mariadb-commits.bash
to find the commits that must be removed. - Manualy remove those commits from the 4 CSV files at the
write-data-in-csv/csv-files
directory.
- Return to the base directory and then navigate to the
statistical-analysis
directory. - Run the desired scripts to perform the statistical analysis.
- Return to the base directory and then navigate to the
plots
directory. - Run the desired scripts to generate the plots.
-
Visit Scopus.
-
Perform the first search:
- Search within:
language
- Search Documents:English
- Click 'Analyze results'
- Select year range to analyze: 2000 to 2023
- Click 'Export' -> 'Export the data to a CSV file'-> 'Export'
- Search within:
-
Perform the second search:
- Search within:
Article title, Abstract, Keywords
- Search Documents:"wellness" OR "well-being" OR "work-life balance"
and within:language
- Search Documents:English
- Click 'Analyze results'
- Select year range to analyze: 2000 to 2023
- Click 'Export' -> 'Export the data to a CSV file'-> 'Export'
- Search within:
-
In the
discussion/work-life-balance-approach
directory, runpublication-analysis.py
.
- Navigate to the
discussion/contributors-number
directory. - Run
all_contributors_per_year.bash
. - Run
contributors_plot.py
.
- Navigate to the
discussion/avg-commit-lines-per-year-analysis
directory. - Run
lines_per_commit.bash
. - Run
lines_per_commit_plot.py
.
- Navigate to the
discussion/FreeBSD-demographics-analysis
directory and run the scripts.
To analyze the distribution of programming languages in the sampled projects:
- Navigate to the
distribution-of-languages
directory. - Run
find_distribution.py
to find the distribution.