From 4b407143fa517ab2e09deeefa5dd87ee7d2b44b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gonzalo=20Su=C3=A1rez=20Losada?= Date: Mon, 16 Jun 2025 02:53:00 +0200 Subject: [PATCH 1/7] feat: added post about ddario's and gonza's latest work --- blog/2025-06-01-latest-work.md | 99 ++++++++++++++++++++++++++++++++++ 1 file changed, 99 insertions(+) create mode 100644 blog/2025-06-01-latest-work.md diff --git a/blog/2025-06-01-latest-work.md b/blog/2025-06-01-latest-work.md new file mode 100644 index 0000000..ff07f76 --- /dev/null +++ b/blog/2025-06-01-latest-work.md @@ -0,0 +1,99 @@ +--- +slug: latest-work +title: Our latest work with a client +authors: Dario, Gonza +tags: [Kubernetes, Infrastructure, MySQL, MongoDB, AWS, CICD, GitOps] +image: +--- + +## How it started +*Explicar brevemente la infra original de Soty (despliegue de recursos con ploi, mongo en scalegrid, ...)* +At first, our client had its software deployed mostly in Digital Ocean with some resources in ploi.io and ScaleGrid. + + +## What was the goal +*Explicar brevemente qué metas teníamos para el proyecto y porqué* + - *migrar a kubernetes pq proporciona escqlado horizontal automático, optimiza las BBDD reduciendo el costo de nube y CO2...* + - *enseñar mejores practicas de desarrollo sw* + - *...* + +Al empezar, nuestro objetivo era claro: teniamos que mejorar su plataforma, y así ofrecerles una mejor experiencia en el desarrollo del software a la vez que se reducen los costes asociados. + +Para ello, se estandarizaron sus entornos de infraestructura con EKS, ya que proporciona un escalado horizontal automático, lo que permite abaratar los gastos y el consumo de CO2 al solo aprovisionar los recursos necesarios. +En cada entorno se usa Helm y ArgoCD para optimizar los despliegues de distintas aplicaciones, como puede ser la aplicación web y una MongoDB. +Además, se adoptaron GitOps con el fin de automatizar el desarrollo del software, garantizando despliegues rápidos y fiables de aplicaciones y servicios en CICD. + + +## How was it archieved? +*Un subapartado para cada punto de los anteriores en el que se explique técnicamente qué cosas contribuyeron a ese punto se hicieron y luego describirlas* +- *optimizar las BBDD: despliegue de mongo y mysql en k8s explicando el proceso y las decisiones arquitectónicas elegidas* + +### Kubernetes clusters +Están desplegados en la nube, gracias al servicio de AWS EKS, y orquesta gran parte de los servicios de nuestro cliente. +Además, también aloja otros servicios que forman parte de nuestra plataforma: +- **Stack de observabilidad y alertas** para ofrecer al cliente visibilidad sobre las aplicaciones desplegadas y sus recursos correspondientes. +- **Herramientas de CICD** para optimizar el despliegue de aplicaciones. +- **Balanceadores de carga** para distribuir las peticiones que lleguen a un servicio de manera equitativa entre distintas réplicas. +- **External secrets manager** para poder extraer información sensible del código en forma de secretos que pueden ser manejados desde el servicio que ofrece AWS. + +### Custom image for their web application +La aplicación del cliente está desarrollada en PHP, por lo que se creó un Helm personalizado para poder manejar de manera automática los despliegues con ArgoCD. +Posteriormente, se optimizó la imagen del sistema operativo que se usaba, creando una imagen propia que ya tenia descargados aquellos paquetes necesarios para que la aplicación funcione: +Con ello, se consiguió reducir el tiempo para buildear la imagen desde los 8 minutos a 2. + +Además, la aplicación se decidió por correr en nodos con arquitectura ARM, que es la arquitectura que usaba de antes el cliente. + + +### Databases migration and optimization +La aplicación del cliente usa una base de datos MongoDB y una MySQL junto con una Redis. +Todos estos recursos se migraron a los nuevos entornos: +#### MongoDB +Está desplegada en el clúster a través de Helm en una arquitectura con 3 réplicas para asegurar su disponibilidad. + +Inicialmente, se trató de usar una versión en ARM de la MongoDB para reutilizar los nodos existentes y reducir el coste de la infraestructura. +Sin embargo, debido al poco soporte de imágenes de MongoDB compatibles con arquitecturas ARM, se optó por usar arquitectura AMD en cambio. + +La base de datos de origen estaba en la versión v5.x por lo que además de hacer una migración, se decidió por hacer un upgrade a una versión con soporte y que se pueda mantener actualizada frente a vulnerabilidades. +Se pensó originalmente en poner la versión más reciente (v8.x), pero resultó imposible hacer la migración a esta versión puesto que la BBDD de origen usa colecciones de tipo *timeseries* que no mantienen consistencia en el esquema de sus datos. +Se trató de sortear este problema por varias maneras: +- Usando MongoTools en una versión superior a v100.4 (v100.12.0) ya que según [lo escrito por un empleado de MongoDB](https://www.mongodb.com/community/forums/t/database-tools-100-4-0-released/115727), permite la migración de *timeseries* +- Creando las colecciones primero, activar la flag ``timeseriesBucketsMayHaveMixedSchemaData`` y luego importar los datos. +- Modificando los metadatos de creación de la colección para que añadiera la flag activada. + +Finalmente, se decidió por migrar a la versión v7.x, que todavía posee soporte y en un futuro actualizar a la v8.x y activar manualmente dicha flag. + +#### MySQL +Se usa el servicio de AWS Aurora y RDS, ya que nos permite tener una implementación rápida a través de código con Terraform. +Originalmente, el proceso de migración pensaba hacerse con AWS DMS pero presenta una serie de inconvenientes que resultan complicados de sortear: +- **DMS no es capaz de migrar objetos secundarios (FKs y restricciones de cascada):** Ya que DMS funciona replicando cambios en logs durante el proceso de replicación, pero los motores de BBDD no guardan logs de objetos secundarios. Hasta 2020 había una flag (``HandleCascadeConstraints``) que, aunque no documentada, permitía evitar esta restricción. Al estar deprecada y no disponible actualmente, la unica solución es modificar el DDL de la base de datos en más de las 400 ocurrencias que hay de FKs y restricciones de cascada. +- **Necesidad de desabilitar FKs durante la migración:** Lo que obliga a parar la aplicación. +- **Problemas de timeouts en la base de datos de origen:** Que se resolviendo importando los recursos necesarios en Terraform a código y modificando sus valores, ya que al ser *selfmanaged*, no hay otra forma de editar estos valores. + + +#### Redis +De manera similar a MySQL, nos decantamos por usar Terraform y el servicio de ElastiCache. + +### Self-hosted action runners +Creación de runners propios para correr GitHub Actions, más eficientes y baratos que los ofrecidos de por GitHub. + + +## Other improvements: +*Explicar brevemente otras mejoras que se llevaron a cabo (creación de imágenes de su aplicación personalizadas para despliegues más eficientes, finops,...)* + +También se hicieron un conjunto de tareas que mejoraron el sistema, de alguna u otra forma: +- **FinOps:** Estudio y análisis de la infraestructura a lo largo de varios momentos con los que poder ajustar los recursos y así reducir gastos. +- **Secretos:** Se extrajeron datos sensibles presentes en el código del cliente a secretos en AWS, ofreciendo así una mayor seguridad de su aplicación. +- **Sistema de colas:** Migración de un sistema de colas desde DigitalOcean a AWS, manejado por Terraform. También se añadieron neuvas colas, DLQs, a las originales donde se redirigirán aquellos mensajes que no sean extraidos en un tiempo. Esto ayuda al cliente a debuggear y a tener mayor control de los problemas que puedan existir. +- **Migración de buckets S3:** Configuración y migración de un bucket por entorno, desde DO hasta AWS con rutas de acceso restringido y rutas de acceso público desde el Internet. +- **QA:** Testeo de la aplicación a lo largo de distintos momentos. + + + + + + + + + + + From 55afd0d7cc39ea80a5596de4ba2753721cdc639d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gonzalo=20Su=C3=A1rez=20Losada?= Date: Mon, 16 Jun 2025 10:34:08 +0200 Subject: [PATCH 2/7] chore: translated to English --- blog/2025-06-01-latest-work.md | 122 ++++++++++++++++++--------------- 1 file changed, 68 insertions(+), 54 deletions(-) diff --git a/blog/2025-06-01-latest-work.md b/blog/2025-06-01-latest-work.md index ff07f76..33057da 100644 --- a/blog/2025-06-01-latest-work.md +++ b/blog/2025-06-01-latest-work.md @@ -7,86 +7,100 @@ image: --- ## How it started -*Explicar brevemente la infra original de Soty (despliegue de recursos con ploi, mongo en scalegrid, ...)* At first, our client had its software deployed mostly in Digital Ocean with some resources in ploi.io and ScaleGrid. ## What was the goal -*Explicar brevemente qué metas teníamos para el proyecto y porqué* - - *migrar a kubernetes pq proporciona escqlado horizontal automático, optimiza las BBDD reduciendo el costo de nube y CO2...* - - *enseñar mejores practicas de desarrollo sw* - - *...* - -Al empezar, nuestro objetivo era claro: teniamos que mejorar su plataforma, y así ofrecerles una mejor experiencia en el desarrollo del software a la vez que se reducen los costes asociados. - -Para ello, se estandarizaron sus entornos de infraestructura con EKS, ya que proporciona un escalado horizontal automático, lo que permite abaratar los gastos y el consumo de CO2 al solo aprovisionar los recursos necesarios. -En cada entorno se usa Helm y ArgoCD para optimizar los despliegues de distintas aplicaciones, como puede ser la aplicación web y una MongoDB. -Además, se adoptaron GitOps con el fin de automatizar el desarrollo del software, garantizando despliegues rápidos y fiables de aplicaciones y servicios en CICD. +At the beginning, our goal was clear: we had to improve their platform, thereby offering them a better software development experience while reducing associated costs. +To achieve this, their infrastructure environments were standardized with EKS, as it provides automatic horizontal scaling, which helps lower expenses and CO2 consumption by provisioning only the necessary resources. +In each environment, Helm and ArgoCD are used to optimize the deployment of various applications, such as the web application and a MongoDB database. +Additionally, GitOps was adopted to automate software development, ensuring fast and reliable deployments of applications and services in CI/CD. ## How was it archieved? -*Un subapartado para cada punto de los anteriores en el que se explique técnicamente qué cosas contribuyeron a ese punto se hicieron y luego describirlas* -- *optimizar las BBDD: despliegue de mongo y mysql en k8s explicando el proceso y las decisiones arquitectónicas elegidas* - ### Kubernetes clusters -Están desplegados en la nube, gracias al servicio de AWS EKS, y orquesta gran parte de los servicios de nuestro cliente. -Además, también aloja otros servicios que forman parte de nuestra plataforma: -- **Stack de observabilidad y alertas** para ofrecer al cliente visibilidad sobre las aplicaciones desplegadas y sus recursos correspondientes. -- **Herramientas de CICD** para optimizar el despliegue de aplicaciones. -- **Balanceadores de carga** para distribuir las peticiones que lleguen a un servicio de manera equitativa entre distintas réplicas. -- **External secrets manager** para poder extraer información sensible del código en forma de secretos que pueden ser manejados desde el servicio que ofrece AWS. - -### Custom image for their web application -La aplicación del cliente está desarrollada en PHP, por lo que se creó un Helm personalizado para poder manejar de manera automática los despliegues con ArgoCD. -Posteriormente, se optimizó la imagen del sistema operativo que se usaba, creando una imagen propia que ya tenia descargados aquellos paquetes necesarios para que la aplicación funcione: -Con ello, se consiguió reducir el tiempo para buildear la imagen desde los 8 minutos a 2. +They are deployed in the cloud, thanks to the AWS EKS service, which orchestrates a large portion of our client's services. -Además, la aplicación se decidió por correr en nodos con arquitectura ARM, que es la arquitectura que usaba de antes el cliente. +Additionally, it also hosts other services that are part of our platform: +- **Observability and alerting stack** to provide the client with visibility into deployed applications and their corresponding resources. +- **CI/CD tools** to optimize application deployment. +- **Load balancers** to distribute incoming requests evenly across different replicas. +- **External Secrets Manager** to extract sensitive information from the code in the form of secrets, which can be managed via AWS's service. +### Custom image for their web application +The client's application is developed in PHP, so a custom Helm chart was created to automate deployments using ArgoCD. +Later, the operating system image was optimized by creating a custom image that already included all the necessary pre-downloaded packages for the application to run. +This reduced the image build time from 8 minutes down to just 2. +Additionally, the application was configured to run on ARM-based nodes, matching the client's existing architecture. ### Databases migration and optimization -La aplicación del cliente usa una base de datos MongoDB y una MySQL junto con una Redis. -Todos estos recursos se migraron a los nuevos entornos: +The client's application uses a MongoDB database, a MySQL database, and a Redis instance. +All of these resources were migrated to the new environments: #### MongoDB -Está desplegada en el clúster a través de Helm en una arquitectura con 3 réplicas para asegurar su disponibilidad. +The database is deployed in the cluster using Helm with a 3-replica architecture to ensure high availability. -Inicialmente, se trató de usar una versión en ARM de la MongoDB para reutilizar los nodos existentes y reducir el coste de la infraestructura. -Sin embargo, debido al poco soporte de imágenes de MongoDB compatibles con arquitecturas ARM, se optó por usar arquitectura AMD en cambio. +Initially, we attempted to use an ARM-compatible version of MongoDB to leverage existing nodes and reduce infrastructure costs. +However, due to limited support for ARM-optimized MongoDB images, we ultimately switched to an AMD-based architecture instead. +This decision balanced compatibility with stability while still maintaining efficient resource utilization in the cluster. -La base de datos de origen estaba en la versión v5.x por lo que además de hacer una migración, se decidió por hacer un upgrade a una versión con soporte y que se pueda mantener actualizada frente a vulnerabilidades. -Se pensó originalmente en poner la versión más reciente (v8.x), pero resultó imposible hacer la migración a esta versión puesto que la BBDD de origen usa colecciones de tipo *timeseries* que no mantienen consistencia en el esquema de sus datos. -Se trató de sortear este problema por varias maneras: -- Usando MongoTools en una versión superior a v100.4 (v100.12.0) ya que según [lo escrito por un empleado de MongoDB](https://www.mongodb.com/community/forums/t/database-tools-100-4-0-released/115727), permite la migración de *timeseries* -- Creando las colecciones primero, activar la flag ``timeseriesBucketsMayHaveMixedSchemaData`` y luego importar los datos. -- Modificando los metadatos de creación de la colección para que añadiera la flag activada. +The original database was running on version 5.x, so in addition to migration, we decided to upgrade to a supported version that could be kept updated against vulnerabilities. +Initially, we considered moving to the latest version (8.x), but this proved impossible because the source database uses timeseries collections, which do not maintain strict schema consistency for their data. +We attempted several workarounds to resolve this issue: +- Using MongoTools v100.12.0+ Based on a [MongoDB employee’s post](https://www.mongodb.com/community/forums/t/database-tools-100-4-0-released/115727), this version supposedly supports timeseries migration. +- Pre-creating collections with ``timeseriesBucketsMayHaveMixedSchemaData`` flag – We tried enabling this flag before importing data to bypass schema conflicts +- Modifying collection metadata – We attempted to manually adjust creation metadata to include the flag. -Finalmente, se decidió por migrar a la versión v7.x, que todavía posee soporte y en un futuro actualizar a la v8.x y activar manualmente dicha flag. +Unfortunately, none of these approaches fully resolved the compatibility issues, forcing us to settle on an intermediate supported version (7.x) and schedule an upgrade in the future. #### MySQL -Se usa el servicio de AWS Aurora y RDS, ya que nos permite tener una implementación rápida a través de código con Terraform. -Originalmente, el proceso de migración pensaba hacerse con AWS DMS pero presenta una serie de inconvenientes que resultan complicados de sortear: -- **DMS no es capaz de migrar objetos secundarios (FKs y restricciones de cascada):** Ya que DMS funciona replicando cambios en logs durante el proceso de replicación, pero los motores de BBDD no guardan logs de objetos secundarios. Hasta 2020 había una flag (``HandleCascadeConstraints``) que, aunque no documentada, permitía evitar esta restricción. Al estar deprecada y no disponible actualmente, la unica solución es modificar el DDL de la base de datos en más de las 400 ocurrencias que hay de FKs y restricciones de cascada. -- **Necesidad de desabilitar FKs durante la migración:** Lo que obliga a parar la aplicación. -- **Problemas de timeouts en la base de datos de origen:** Que se resolviendo importando los recursos necesarios en Terraform a código y modificando sus valores, ya que al ser *selfmanaged*, no hay otra forma de editar estos valores. +We use AWS Aurora and RDS because they enable rapid infrastructure-as-code deployment via Terraform. +Originally, the migration was planned with AWS DMS, but several critical limitations made it impractical: +- **DMS cannot migrate secondary objects (FKs and cascade constraints):** + - DMS replicates changes via database logs, but database engines do not log secondary object dependencies. + - Until 2020, an undocumented flag (HandleCascadeConstraints) could bypass this, but it’s now deprecated. + - The only workaround is manually modifying the DDL in 400+ instances of FKs and cascade constraints. -#### Redis -De manera similar a MySQL, nos decantamos por usar Terraform y el servicio de ElastiCache. +- **Requirement to disable FKs during migration:** + - Forces application downtime. + +- **Source database timeout issues:** + - Resolved by importing self-managed resources into Terraform and adjusting their parameters directly (no alternative for non-managed databases). -### Self-hosted action runners -Creación de runners propios para correr GitHub Actions, más eficientes y baratos que los ofrecidos de por GitHub. + +#### Redis +Similarly to MySQL, we opted to use Terraform and ElastiCache for deployment and management. + +### Self-Hosted GitHub Action Runners +We implemented custom self-hosted runners to execute GitHub Actions workflows. This approach proved: +- More efficient – Reduced latency and improved performance compared to GitHub's hosted runners. +- Cost-effective – Lowered operational expenses by avoiding GitHub's per-minute billing. + ## Other improvements: -*Explicar brevemente otras mejoras que se llevaron a cabo (creación de imágenes de su aplicación personalizadas para despliegues más eficientes, finops,...)* +A series of improvements were implemented to enhance the system’s efficiency, security, and reliability: +- **FinOps:** + - Conducted infrastructure cost analysis across multiple usage cycles to right-size resources and reduce expenses. + +- **Secrets Management:** + - Migrated hardcoded sensitive data to AWS Secrets Manager, eliminating exposure risks in the client’s codebase. + +- **Queue System Upgrade:** + - Migrated from DigitalOcean to AWS-managed queues (via Terraform). + - Added Dead-Letter Queues (DLQs) to capture/troubleshoot failed messages, improving debuggability and error control. + +- **S3 Bucket Migration:** + - Migrated and configured per-environment S3 buckets (DigitalOcean → AWS). + - Implemented restricted-access paths (private) and public-facing endpoints (Internet-accessible) as needed. -También se hicieron un conjunto de tareas que mejoraron el sistema, de alguna u otra forma: -- **FinOps:** Estudio y análisis de la infraestructura a lo largo de varios momentos con los que poder ajustar los recursos y así reducir gastos. -- **Secretos:** Se extrajeron datos sensibles presentes en el código del cliente a secretos en AWS, ofreciendo así una mayor seguridad de su aplicación. -- **Sistema de colas:** Migración de un sistema de colas desde DigitalOcean a AWS, manejado por Terraform. También se añadieron neuvas colas, DLQs, a las originales donde se redirigirán aquellos mensajes que no sean extraidos en un tiempo. Esto ayuda al cliente a debuggear y a tener mayor control de los problemas que puedan existir. -- **Migración de buckets S3:** Configuración y migración de un bucket por entorno, desde DO hasta AWS con rutas de acceso restringido y rutas de acceso público desde el Internet. -- **QA:** Testeo de la aplicación a lo largo de distintos momentos. +- **QA & Testing:** + - Performed continuous application testing at critical stages to ensure stability. +- **Why This Matters:** + - Security: Secrets abstraction + restricted S3 paths minimize attack surfaces. + - Cost Control: FinOps ensures no overprovisioning. + - Resilience: DLQs prevent message loss and simplify failure analysis. From e344ac02a3696f895b7f32b2c846ebe5a02a6287 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gonzalo=20Su=C3=A1rez=20Losada?= Date: Tue, 17 Jun 2025 12:00:39 +0200 Subject: [PATCH 3/7] docs: improved Lucia's runner documentation for post --- blog/2025-06-01-latest-work.md | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/blog/2025-06-01-latest-work.md b/blog/2025-06-01-latest-work.md index 33057da..cd56a74 100644 --- a/blog/2025-06-01-latest-work.md +++ b/blog/2025-06-01-latest-work.md @@ -17,6 +17,7 @@ To achieve this, their infrastructure environments were standardized with EKS, a In each environment, Helm and ArgoCD are used to optimize the deployment of various applications, such as the web application and a MongoDB database. Additionally, GitOps was adopted to automate software development, ensuring fast and reliable deployments of applications and services in CI/CD. + ## How was it archieved? ### Kubernetes clusters They are deployed in the cloud, thanks to the AWS EKS service, which orchestrates a large portion of our client's services. @@ -67,17 +68,20 @@ Originally, the migration was planned with AWS DMS, but several critical limitat - **Source database timeout issues:** - Resolved by importing self-managed resources into Terraform and adjusting their parameters directly (no alternative for non-managed databases). - - - #### Redis Similarly to MySQL, we opted to use Terraform and ElastiCache for deployment and management. -### Self-Hosted GitHub Action Runners +### Self-Hosted Runners +A self-hosted runner's task is to run CICD workflows within itself, detaching the pipeline from GitHub Actions, for example. +They provide a great customization of the execution environment while offering us a great control of everything happening underneath. +We use Kubernetes as the infrastructure on where to deploy the runners due to its adaptability and scalability along as Actions Runner Controller, which manages several runners concurrently. +As any other application deployed in the cluster, Helm Charts and ArgoCD are used for Continuous Delivery. +We also use AWS Secrets Manager and custom Docker Images to further improve their performance and capabilities. We implemented custom self-hosted runners to execute GitHub Actions workflows. This approach proved: - More efficient – Reduced latency and improved performance compared to GitHub's hosted runners. - Cost-effective – Lowered operational expenses by avoiding GitHub's per-minute billing. + ## Other improvements: A series of improvements were implemented to enhance the system’s efficiency, security, and reliability: - **FinOps:** @@ -100,14 +104,4 @@ A series of improvements were implemented to enhance the system’s efficiency, - **Why This Matters:** - Security: Secrets abstraction + restricted S3 paths minimize attack surfaces. - Cost Control: FinOps ensures no overprovisioning. - - Resilience: DLQs prevent message loss and simplify failure analysis. - - - - - - - - - - + - Resilience: DLQs prevent message loss and simplify failure analysis. \ No newline at end of file From 87715d6bf1ebd8b49cd5bcba8f1a0e2f30255b1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gonzalo=20Su=C3=A1rez=20Losada?= Date: Thu, 19 Jun 2025 21:12:52 +0200 Subject: [PATCH 4/7] fix: made start of the post to be more I+D+I related: focused on a platform to be later sold to other clients --- blog/2025-06-01-latest-work.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/blog/2025-06-01-latest-work.md b/blog/2025-06-01-latest-work.md index cd56a74..32d3500 100644 --- a/blog/2025-06-01-latest-work.md +++ b/blog/2025-06-01-latest-work.md @@ -7,14 +7,14 @@ image: --- ## How it started -At first, our client had its software deployed mostly in Digital Ocean with some resources in ploi.io and ScaleGrid. - +At first, we were challenged: our client had its software deployed in cloud, but wanted to improve their software lifecycle. ## What was the goal -At the beginning, our goal was clear: we had to improve their platform, thereby offering them a better software development experience while reducing associated costs. +And so, it was clear what we wanted to archieve: we were to build a new Internal Development Platform (IDP) which make us leaders of the field for solutions that enhance business' efficiency. +This product would offer future clients a better software development experience while reducing their cloud-associated costs and decreasing the CO2 footprint and energy usage. -To achieve this, their infrastructure environments were standardized with EKS, as it provides automatic horizontal scaling, which helps lower expenses and CO2 consumption by provisioning only the necessary resources. -In each environment, Helm and ArgoCD are used to optimize the deployment of various applications, such as the web application and a MongoDB database. +To achieve this, infrastructure environments were standardized with EKS, as it provides automatic horizontal scaling, which helps lower expenses and CO2 consumption by provisioning only the necessary resources. +For each environment, Helm and ArgoCD are used to optimize the deployment of various applications, such as the web application and a MongoDB database. Additionally, GitOps was adopted to automate software development, ensuring fast and reliable deployments of applications and services in CI/CD. From dc5fda105f608cbfbff2b914f73bad2ef01e05f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gonzalo=20Su=C3=A1rez=20Losada?= Date: Sun, 22 Jun 2025 20:04:06 +0200 Subject: [PATCH 5/7] feat: added section comparing our proposed solution to previous SotySolar's one --- blog/2025-06-01-latest-work.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/blog/2025-06-01-latest-work.md b/blog/2025-06-01-latest-work.md index 32d3500..eb051f4 100644 --- a/blog/2025-06-01-latest-work.md +++ b/blog/2025-06-01-latest-work.md @@ -7,15 +7,15 @@ image: --- ## How it started -At first, we were challenged: our client had its software deployed in cloud, but wanted to improve their software lifecycle. +At first, we were challenged: our client had its software deployed in cloud, but wanted to improve their software lifecycle as it was unefficient their previous way to deploy new software. ## What was the goal And so, it was clear what we wanted to archieve: we were to build a new Internal Development Platform (IDP) which make us leaders of the field for solutions that enhance business' efficiency. This product would offer future clients a better software development experience while reducing their cloud-associated costs and decreasing the CO2 footprint and energy usage. -To achieve this, infrastructure environments were standardized with EKS, as it provides automatic horizontal scaling, which helps lower expenses and CO2 consumption by provisioning only the necessary resources. +To achieve this, infrastructure environments are standardized with Amazon EKS, as it provides automatic horizontal scaling. This helps to lower expenses and CO2 consumption by provisioning only the necessary resources at each moment. For each environment, Helm and ArgoCD are used to optimize the deployment of various applications, such as the web application and a MongoDB database. -Additionally, GitOps was adopted to automate software development, ensuring fast and reliable deployments of applications and services in CI/CD. +Additionally, GitOps are adopted to automate software development, ensuring fast and reliable deployments of applications and services in CI/CD. ## How was it archieved? From 2767fcfde02638e099302876aa9a0b5abe5ca9f6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gonzalo=20Su=C3=A1rez=20Losada?= Date: Sun, 22 Jun 2025 20:07:24 +0200 Subject: [PATCH 6/7] fix: removed technical details about MySQL and MongoDB. Also removed duplicated FinOps as it is earlier explained with more detail --- blog/2025-06-01-latest-work.md | 76 ++++++++++++++++++++++++---------- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/blog/2025-06-01-latest-work.md b/blog/2025-06-01-latest-work.md index eb051f4..3d718a1 100644 --- a/blog/2025-06-01-latest-work.md +++ b/blog/2025-06-01-latest-work.md @@ -7,7 +7,8 @@ image: --- ## How it started -At first, we were challenged: our client had its software deployed in cloud, but wanted to improve their software lifecycle as it was unefficient their previous way to deploy new software. +At first, a challenged was proposed by Be Energy Part S.L. (SotySolar): they are one of the greatest companies regarding solar panels instalations and other energetic sources. +They have their software deployed in cloud, but wanted to improve their software lifecycle as it was unefficient their previous way to deploy new software. ## What was the goal And so, it was clear what we wanted to archieve: we were to build a new Internal Development Platform (IDP) which make us leaders of the field for solutions that enhance business' efficiency. @@ -17,6 +18,54 @@ To achieve this, infrastructure environments are standardized with Amazon EKS, a For each environment, Helm and ArgoCD are used to optimize the deployment of various applications, such as the web application and a MongoDB database. Additionally, GitOps are adopted to automate software development, ensuring fast and reliable deployments of applications and services in CI/CD. +## Explaining Our Solution +### Cloud Provider Migration: From DigitalOcean to Amazon Web Services (AWS) +As AWS partners, we chose to migrate the current infrastructure from DigitalOcean to AWS due to better pricing and AWS's strong commitment to sustainability. +Since 2023, AWS has matched its energy usage with renewable sources and is actively pursuing greener solutions, aiming for net-zero carbon emissions by 2040 and to become water-positive by 2030. + + +### Infrastructure Management: From VMs to Kubernetes +Virtual Machines (VMs) offer simplicity and are easier to operate, modify, and use for rapid prototyping. +They’re accessible for smaller teams and help deliver quick initial deployments. +However, they rely heavily on manual management and don't scale efficiently, which was a problem in SotySolar's infrastructure. + +We propose to use Kubernetes, a tool that automates container orchestration and provides a robust ecosystem with self-healing, rolling updates, and simplified horizontal scaling. +It offers greater customization, enabling features like cluster observability, resource monitoring, and alerting. +While Kubernetes has a steeper learning curve, it supports Infrastructure as Code (IaC), allowing declarative deployments and GitOps workflows that improve reliability and automation. + + +### GitOps and the importance of Infrastructure as Code (IaC) +GitOps is a methodology for managing infrastructure using IaC tools (like YAML, Terraform, and Helm) with Git as the single source of truth. +It brings agility, automation, and consistency to cloud environments by allowing all changes to go through version-controlled Git workflows. +This enables streamlined CI/CD pipelines, where new deployments can be triggered automatically when a pull request is merged. +It also ensures that all environments remain consistent, reducing configuration drift and simplifying rollbacks. + +Without IaC, infrastructure changes are often undocumented, leading to technical debt and inconsistent environments. +Manual setups make disaster recovery slower and more error-prone, as there’s no clear record of the intended system state. +Additionally, it becomes harder to maintain visibility and control over infrastructure, as well as scale it, especially as systems grow in complexity. + + +### Upgrading older software +Software components, especially databases, are often left on outdated versions due to concerns about downtime, dependency breakage, or lack of automated deployment processes. +In many environments, updates are delayed simply because the system is stable and changing it feels risky or time-consuming, which adds up technical debt and engraves the problem. + +However, keeping old versions introduces several issues: +They often contain known security vulnerabilities, may lack critical performance improvements, and can become incompatible with newer services or libraries. +Over time, vendors may drop support entirely, making future updates more complex and leaving systems exposed or providing little to no help in future situations. + +This can be avoided by integrating version management into regular development workflows. +Using tools like Terraform, Helm, or Docker, updates can be defined and tested as part of infrastructure code. +Combined with CI/CD pipelines and staging environments, we allow to roll out updates safely in minor environments, detect issues early, and reduce the risk of problems in production. + + +### FinOps and costs management +FinOps (short for Financial Operations) is a practice that helps organizations manage and optimize their cloud spending by combining finance, engineering, and operations teams. +It focuses on creating visibility, accountability, and control over cloud costs. This enables teams to make informed decisions about their current infrastructure. + +We believe adopting FinOps early is important because cloud costs can grow rapidly and unpredictably, especially as systems scale. +Also, without any proper cost tracking, it becomes hard to properly provision resources. +FinOps leads to better financial planning and more sustainable cloud practices from the start of the project. + ## How was it archieved? ### Kubernetes clusters @@ -46,27 +95,12 @@ This decision balanced compatibility with stability while still maintaining effi The original database was running on version 5.x, so in addition to migration, we decided to upgrade to a supported version that could be kept updated against vulnerabilities. Initially, we considered moving to the latest version (8.x), but this proved impossible because the source database uses timeseries collections, which do not maintain strict schema consistency for their data. -We attempted several workarounds to resolve this issue: -- Using MongoTools v100.12.0+ Based on a [MongoDB employee’s post](https://www.mongodb.com/community/forums/t/database-tools-100-4-0-released/115727), this version supposedly supports timeseries migration. -- Pre-creating collections with ``timeseriesBucketsMayHaveMixedSchemaData`` flag – We tried enabling this flag before importing data to bypass schema conflicts -- Modifying collection metadata – We attempted to manually adjust creation metadata to include the flag. - -Unfortunately, none of these approaches fully resolved the compatibility issues, forcing us to settle on an intermediate supported version (7.x) and schedule an upgrade in the future. +We attempted several workarounds to resolve this issue. Unfortunately, none of these approaches fully resolved the compatibility issues, forcing us to settle on an intermediate supported version (7.x) and schedule an upgrade in the future. #### MySQL We use AWS Aurora and RDS because they enable rapid infrastructure-as-code deployment via Terraform. -Originally, the migration was planned with AWS DMS, but several critical limitations made it impractical: - -- **DMS cannot migrate secondary objects (FKs and cascade constraints):** - - DMS replicates changes via database logs, but database engines do not log secondary object dependencies. - - Until 2020, an undocumented flag (HandleCascadeConstraints) could bypass this, but it’s now deprecated. - - The only workaround is manually modifying the DDL in 400+ instances of FKs and cascade constraints. - -- **Requirement to disable FKs during migration:** - - Forces application downtime. - -- **Source database timeout issues:** - - Resolved by importing self-managed resources into Terraform and adjusting their parameters directly (no alternative for non-managed databases). +Originally, the migration was planned with AWS DMS, but several critical limitations made it impractical. +Finally, we decided to be ourselves the ones to do the migration task by hand. #### Redis Similarly to MySQL, we opted to use Terraform and ElastiCache for deployment and management. @@ -84,9 +118,6 @@ We implemented custom self-hosted runners to execute GitHub Actions workflows. T ## Other improvements: A series of improvements were implemented to enhance the system’s efficiency, security, and reliability: -- **FinOps:** - - Conducted infrastructure cost analysis across multiple usage cycles to right-size resources and reduce expenses. - - **Secrets Management:** - Migrated hardcoded sensitive data to AWS Secrets Manager, eliminating exposure risks in the client’s codebase. @@ -103,5 +134,4 @@ A series of improvements were implemented to enhance the system’s efficiency, - **Why This Matters:** - Security: Secrets abstraction + restricted S3 paths minimize attack surfaces. - - Cost Control: FinOps ensures no overprovisioning. - Resilience: DLQs prevent message loss and simplify failure analysis. \ No newline at end of file From d61cf13145cdac553c1cceccb6f12a6a70449243 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gonzalo=20Su=C3=A1rez=20Losada?= Date: Thu, 26 Jun 2025 11:26:23 +0200 Subject: [PATCH 7/7] fix: changes done to blog --- blog/2025-06-01-latest-work.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/blog/2025-06-01-latest-work.md b/blog/2025-06-01-latest-work.md index 3d718a1..5f83779 100644 --- a/blog/2025-06-01-latest-work.md +++ b/blog/2025-06-01-latest-work.md @@ -6,15 +6,18 @@ tags: [Kubernetes, Infrastructure, MySQL, MongoDB, AWS, CICD, GitOps] image: --- -## How it started -At first, a challenged was proposed by Be Energy Part S.L. (SotySolar): they are one of the greatest companies regarding solar panels instalations and other energetic sources. -They have their software deployed in cloud, but wanted to improve their software lifecycle as it was unefficient their previous way to deploy new software. +## Case: SotySolar +We partnered with Be Energy Part S.L. (operating under the brand Sotysolar), a leading force in the solar panel installation sector across Spain and Portugal and green energetic solutions. +With over 12,000 panels installed and a growing presence in the renewable energy market, Sotysolar continues to push the boundaries of sustainable innovation. + +Our collaboration began when Sotysolar approached us to streamline and modernize their software development lifecycle. +Their goal was to adopt a more agile and efficient approach to cloud-based software deployment, enabling them to scale faster, reduce downtime, and improve delivery speed. ## What was the goal -And so, it was clear what we wanted to archieve: we were to build a new Internal Development Platform (IDP) which make us leaders of the field for solutions that enhance business' efficiency. -This product would offer future clients a better software development experience while reducing their cloud-associated costs and decreasing the CO2 footprint and energy usage. +Our objective was to build a new Internal Development Platform which would allow to adopt leading practices to enhance business efficiency. +This product would offer SotySolar a better software development experience and allow to scale their services effortlessly while reducing their cloud-associated costs and decreasing the CO2 footprint and energy usage. -To achieve this, infrastructure environments are standardized with Amazon EKS, as it provides automatic horizontal scaling. This helps to lower expenses and CO2 consumption by provisioning only the necessary resources at each moment. +The solution uses Amazon EKS for seamless autoscaling, to help reduce expenses and CO2 consumption by provisioning only the necessary resources at each moment. For each environment, Helm and ArgoCD are used to optimize the deployment of various applications, such as the web application and a MongoDB database. Additionally, GitOps are adopted to automate software development, ensuring fast and reliable deployments of applications and services in CI/CD. @@ -22,6 +25,7 @@ Additionally, GitOps are adopted to automate software development, ensuring fast ### Cloud Provider Migration: From DigitalOcean to Amazon Web Services (AWS) As AWS partners, we chose to migrate the current infrastructure from DigitalOcean to AWS due to better pricing and AWS's strong commitment to sustainability. Since 2023, AWS has matched its energy usage with renewable sources and is actively pursuing greener solutions, aiming for net-zero carbon emissions by 2040 and to become water-positive by 2030. +We strongly believe that these objectives align with Sotysolar's goals as a renewable energy company. ### Infrastructure Management: From VMs to Kubernetes