A little something for everyone this week, with an emphasis on OpenTelemetry and Prometheus, and a look back on the CrowdStrike outage. Enjoy! šŸŒžšŸ›¶ā°



This issue is sponsored by:

Blacksmith logo

Run GitHub Actions up to 2x faster at half the cost

Blacksmith runs your GitHub Actions substantially faster by running them on modern gaming CPUs. Integrating Blacksmith is a one-line code change. 100+ companies like GitBook, Superblocks, and Slope use Blacksmith to help developers merge code faster.



Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, Iā€™m also there, but Iā€™m sure everyone else is way cooler.

From The Community

OpenTelemetry: Python SDK Design & Fundamental Concepts

An excellent post for anyone who already has a passing knowledge of OTel but needs more detail specific to Python apps and libraries.

Engineering Resilience: Lessons from the CrowdStrike-Microsoft Incident

A sort of aggregated postmortem on the CrowdStrike incident from an outsiderā€™s perspective. As someone who was privileged to not be impacted directly by this, itā€™s a fascinate look back on what I missed (whew).

Java 21 Virtual Threads - Dude, Whereā€™s My Lock?

Less of an observability or monitoring post, but still a facscinating read on systems engineering and debugging through the eyes of a Netflix engineer.

Top 20 Linux Bandwidth Monitoring Tools in 2024

Always fun to revisit network troubleshooting and monitoring tools, mostly because it reminds me that I used to know something about networking in the Before Cloud :tm: days.

Manage your monitors more efficiently with Datadog Teams

I just left a company who used Datadog and I had no idea these capabilities existed for Teams. Of course, I had pretty limited permissions to start with, so itā€™s no great surprise I didnā€™t know about it. šŸ˜ˆ

Mastering Prometheus for Robust System Monitoring

A solid post for anyone moving beyond the ā€œok, I installed Prometheusā€¦ what nextā€ phase to critical planning for a successful integration within existing infrastructure. Always good to revisit the discovery options available within Prometheus.

Mezmo logo

Mezmo's telemetry pipeline streamlines data collection, profiling, transformation, routing, and analysis. Our free Telemetry Data Profiling Offer helps you understand and optimize your data to meet your observability goals. Sign up for a free trial to experience the platform first-hand. (SPONSORED)



11 Takeaways from Observability Engineering Book

Some solid notes and highlights from ā€œtheā€ observability book. Unsurprisingly, thereā€™s a bit of overlap from my own recent reading of the Learning OpenTelemetry book.

Mastering Zabbix Regular Expressions: A Comprehensive Guide

I donā€™t run into many Zabbix shops this side of the Atlantic Ocean, but if youā€™re one of them you might enjoy this dive into Zabbix regex.

Donā€™t get blinded by your Observability tools

A reminder to be mindful of what youā€™re collecting; most shops can no longer afford to ā€œmonitor all the thingsā€.

Tools

DrDroidLab/playbooks

ā€œRunbook automation platform with deep observability integrations for SRE & On-Call Teamsā€

Job Opportunities

Sr. Site Reliability Engineer at Vimeo (US Remote)

See you next week!

ā€“ Jason (@obfuscurity) Monitoring Weekly Editor