-
Notifications
You must be signed in to change notification settings - Fork 2
Challenges of Git #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Kieran, KR Sonakshi |
For adoption, I think git is difficult to both teach and learn. Most advice boils down to "learn these 4 commands, seek help for anything else". The command line interface is (in my experience) how most developers use git. Which is normally another area where users have no (or at best, some) experience and this further makes git both difficult to teach and less appealing to use. The main area where I have typically seen new users struggle after grasping the basics is ensuring they do not cause merge conflicts. This is typically due to not following best practices (due to their inexperience). For business processes, I think the key challenges for git typically lie in two areas. Firstly in the branching strategy, which has many decisions that need to be made. For example, whether you choose to create branches liberally for each dry run/delivery or have a single main branch that is merged onto and tagged at appropriate points (normally not recommended if you have multiple deliveries that may want to share code). Or whether or not you trust users to appropriately choose the names of their branches or enforce a common naming structure linked back to issues/deliverable identifiers. A common issue I have encountered revolves around asynchronous code writing to a synchronised data folder. That is, any user may be on any commit and run a program (say ADSL) on that commit. This will update the synchronised data folder so that all other users are now reading from that version of ADSL. I have typically seen this resolved by having separate folders to store 'protected' versions of the datasets generated from the main delivery branch so that important deliveries are not compromised. Given the capability to prod git and detect the branch a user is on I would be interested in exploring potential automations/protections/best practices when writing datasets to avoid this issue. |
I like this example, as I think it highlights one friction in the use of git in clinical programming. This is that unlike software development, where you are developing towards a unified product which you will release, in clinical programming you are creating lots of inter connected products (datasets and outputs) which can be out of sync with one another. I think there are solutions for this but just using the tool naively will definitely cause issues. |
Challenges (broadly)
|
One challenge I seem to be facing a lot is getting people to understand the benefits of Git and GitHub. I'd like to get to a point where people see GitHub as more than just the place where the code "lives" but I don't think we will get there until more of our processes (i.e., tracking of development progress, QC) are moved into the Git Workflow (possibly aided by other management tools such as Jira). There is a lack of desire to want to engage with or utilise the tool beyond its most basic features as a lot of the more collaborative features are covered by legacy systems. This means that although alternative systems may be more appropriate long term, until there is a validated method of capturing this information for the audit trail, people will not engage with GitHub beyond the necessities as it is seen as "rework". |
I agree with what @tomratford wrote above - especially about the branching strategy - and @kieranjmartin 's reaction. I also think that the main challenge is that git works really well in a classical CI/CD setting where you have a single product that you want to have a continuous production version of (i.e. main branch) while you work on new features in the background (e.g. a webpage). This simply isn't how we deliver statistical programming products in pharma and hence classical git workflows cannot be easily used by us. In addition to the comments about the branching strategy, if you use git-related tools (e.g. GitLab), then these tools come with inherent relationships between branches, issues and merge requests. These entities then get different meanings assigned in our business workflows which do not always align with their technical requirements. For example, you could define a branching strategy with one branch per TLG and say the merge request captures the QC discussion and approval of the merge requests indicates that the TLG has passed QC. If you now need to perform a QC activity on an already QCed TLG as per your QC process that doesn't require you to make changes to the code, you will not be able to create a merge request because there are no changes to be merged. |
Use this issue to capture any challenges you have with git
The text was updated successfully, but these errors were encountered: