CloudOps and CICD

Reduced downtime by 10x in one quarter by increasing uptime to 99.95%

Client's development teams were spending a significant amount of time managing operational and CI/CD aspects of their cloud-hosted applications. The Customer was looking for a partner to help them with Cloud Ops so that their development teams could focus on the product roadmaps.

Key Challenges :

Address issues in deployments.
Service outages.
Lack of automated tests for voice, video, and screen sharing across geographies.
Improve time to market & the ability to roll out new features.

GOALS:

Automate configuration management.
Implement log monitoring and anomaly detection to avoid outages.
Automate monitoring and alerting by simulating voice calls, video calls, and screen sharing to proactively identify and fix issues across different geographies.
Improve service availability.

Solution:

We built RPA bots for end-to-end application testing to ensure application availability and measure the quality of service being delivered in different geographies.
Automated the log monitoring for anomaly detection and issue reporting.
Implemented email alerts and automated Jira ticket creation with logs and screenshots based on the anomaly.
Developed Keyword Driven Framework for RPA workflow automation.
Setup a 24/7 support team with Cloud and DevOps skills.
Automated the log monitoring for anomaly detection and issue reporting.
Built HA and DR for the infrastructure.
Automated configuration management with Ansible.
Automated CI/CD for static content and Kubernetes clusters.

Results:

99.95%

Coordinating with our client’s team, we were able to improve the service availability and reliability from 99.5 (provided by Google Cloud) to 99.95 through regional cluster deployments.

5 mins

First response time for support tickets was reduced to 5 mins.

10 mins

Average resolution time was cut to 10 mins.

During the migration process, we also engaged in process reengineering across the cloud and a few architecture changes leading to-

63%

63% reduction in outages over 2 years.

55%

55% reduction in ticket volume over 2 years.

Successfully eliminated ~2 system-wide outages per month with zero impact on users

99.95%

Improved the service availability and reliability from 99.5 to 99.95

63%

reduction in outages over 2 years.

55%

reduction in ticket volume over 2 years.

5 min

first response time for support tickets was reduced to 5 mins.

Know more about our Managed IT Services

Learn more

Reduced downtime by 10x in one quarter by increasing uptime to 99.95%

Key Challenges :

GOALS:

Solution:

Results: