Skip to content

Challenges of Git #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kieranjmartin opened this issue Apr 3, 2025 · 6 comments
Open

Challenges of Git #5

kieranjmartin opened this issue Apr 3, 2025 · 6 comments

Comments

@kieranjmartin
Copy link
Collaborator

Use this issue to capture any challenges you have with git

@SShankar-ssh
Copy link

Hi Kieran,
its easier when you work on projects where there is 1 developer and 1 qc programmer. This will allow for each ADAM program or TLG program a branch can be created. If teams get bigger, for example there are 3-6 programmers, different practices are being introduced. For example, every programmer has their own branch and from time to time they merge their programs to the milestone branch. Once somebody merged an their branch to the milestone branch and as it was way behind, a selection of programs got overwritten and there was a gap in the commit history. This was not reversible.

KR

Sonakshi

@tomratford
Copy link

For adoption, I think git is difficult to both teach and learn. Most advice boils down to "learn these 4 commands, seek help for anything else". The command line interface is (in my experience) how most developers use git. Which is normally another area where users have no (or at best, some) experience and this further makes git both difficult to teach and less appealing to use. The main area where I have typically seen new users struggle after grasping the basics is ensuring they do not cause merge conflicts. This is typically due to not following best practices (due to their inexperience).

For business processes, I think the key challenges for git typically lie in two areas. Firstly in the branching strategy, which has many decisions that need to be made. For example, whether you choose to create branches liberally for each dry run/delivery or have a single main branch that is merged onto and tagged at appropriate points (normally not recommended if you have multiple deliveries that may want to share code). Or whether or not you trust users to appropriately choose the names of their branches or enforce a common naming structure linked back to issues/deliverable identifiers.

A common issue I have encountered revolves around asynchronous code writing to a synchronised data folder. That is, any user may be on any commit and run a program (say ADSL) on that commit. This will update the synchronised data folder so that all other users are now reading from that version of ADSL. I have typically seen this resolved by having separate folders to store 'protected' versions of the datasets generated from the main delivery branch so that important deliveries are not compromised. Given the capability to prod git and detect the branch a user is on I would be interested in exploring potential automations/protections/best practices when writing datasets to avoid this issue.

@kieranjmartin
Copy link
Collaborator Author

A common issue I have encountered revolves around asynchronous code writing to a synchronised data folder. That is, any user may be on any commit and run a program (say ADSL) on that commit. This will update the synchronised data folder so that all other users are now reading from that version of ADSL. I have typically seen this resolved by having separate folders to store 'protected' versions of the datasets generated from the main delivery branch so that important deliveries are not compromised. Given the capability to prod git and detect the branch a user is on I would be interested in exploring potential automations/protections/best practices when writing datasets to avoid this issue.

I like this example, as I think it highlights one friction in the use of git in clinical programming. This is that unlike software development, where you are developing towards a unified product which you will release, in clinical programming you are creating lots of inter connected products (datasets and outputs) which can be out of sync with one another.

I think there are solutions for this but just using the tool naively will definitely cause issues.

@corinabioinformatic
Copy link
Contributor

corinabioinformatic commented Apr 7, 2025

Challenges (broadly)

  • Fear (as commented today by one of the members) to break anything when pulling or pushing. Because lack of knowledge.
  • No leadership in this area, makes using git like a non-priority goal. Leadership to gain confidence using this is key.
  • Curve -just a bit- learning for advanced features. (Who knows about git merge, not merging because conflicts?)

@LiamHobby
Copy link
Collaborator

One challenge I seem to be facing a lot is getting people to understand the benefits of Git and GitHub. I'd like to get to a point where people see GitHub as more than just the place where the code "lives" but I don't think we will get there until more of our processes (i.e., tracking of development progress, QC) are moved into the Git Workflow (possibly aided by other management tools such as Jira). There is a lack of desire to want to engage with or utilise the tool beyond its most basic features as a lot of the more collaborative features are covered by legacy systems. This means that although alternative systems may be more appropriate long term, until there is a validated method of capturing this information for the audit trail, people will not engage with GitHub beyond the necessities as it is seen as "rework".

@langkabh
Copy link

I agree with what @tomratford wrote above - especially about the branching strategy - and @kieranjmartin 's reaction. I also think that the main challenge is that git works really well in a classical CI/CD setting where you have a single product that you want to have a continuous production version of (i.e. main branch) while you work on new features in the background (e.g. a webpage). This simply isn't how we deliver statistical programming products in pharma and hence classical git workflows cannot be easily used by us.

In addition to the comments about the branching strategy, if you use git-related tools (e.g. GitLab), then these tools come with inherent relationships between branches, issues and merge requests. These entities then get different meanings assigned in our business workflows which do not always align with their technical requirements. For example, you could define a branching strategy with one branch per TLG and say the merge request captures the QC discussion and approval of the merge requests indicates that the TLG has passed QC. If you now need to perform a QC activity on an already QCed TLG as per your QC process that doesn't require you to make changes to the code, you will not be able to create a merge request because there are no changes to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants