Issue 275
A little something for everyone this week, with an emphasis on OpenTelemetry and Prometheus, and a look back on the CrowdStrike outage. Enjoy! šš¶ā°
This issue is sponsored by:
Run GitHub Actions up to 2x faster at half the cost
Blacksmith runs your GitHub Actions substantially faster by running them on modern gaming CPUs. Integrating Blacksmith is a one-line code change. 100+ companies like GitBook, Superblocks, and Slope use Blacksmith to help developers merge code faster.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, Iām also there, but Iām sure everyone else is way cooler.
From The Community
OpenTelemetry: Python SDK Design & Fundamental Concepts
An excellent post for anyone who already has a passing knowledge of OTel but needs more detail specific to Python apps and libraries.
Engineering Resilience: Lessons from the CrowdStrike-Microsoft Incident
A sort of aggregated postmortem on the CrowdStrike incident from an outsiderās perspective. As someone who was privileged to not be impacted directly by this, itās a fascinate look back on what I missed (whew).
Java 21 Virtual Threads - Dude, Whereās My Lock?
Less of an observability or monitoring post, but still a facscinating read on systems engineering and debugging through the eyes of a Netflix engineer.
Top 20 Linux Bandwidth Monitoring Tools in 2024
Always fun to revisit network troubleshooting and monitoring tools, mostly because it reminds me that I used to know something about networking in the Before Cloud :tm: days.
Manage your monitors more efficiently with Datadog Teams
I just left a company who used Datadog and I had no idea these capabilities existed for Teams. Of course, I had pretty limited permissions to start with, so itās no great surprise I didnāt know about it. š
Mastering Prometheus for Robust System Monitoring
A solid post for anyone moving beyond the āok, I installed Prometheusā¦ what nextā phase to critical planning for a successful integration within existing infrastructure. Always good to revisit the discovery options available within Prometheus.
Mezmo's telemetry pipeline streamlines data collection, profiling, transformation, routing, and analysis. Our free Telemetry Data Profiling Offer helps you understand and optimize your data to meet your observability goals. Sign up for a free trial to experience the platform first-hand. (SPONSORED)
11 Takeaways from Observability Engineering Book
Some solid notes and highlights from ātheā observability book. Unsurprisingly, thereās a bit of overlap from my own recent reading of the Learning OpenTelemetry book.
Mastering Zabbix Regular Expressions: A Comprehensive Guide
I donāt run into many Zabbix shops this side of the Atlantic Ocean, but if youāre one of them you might enjoy this dive into Zabbix regex.
Donāt get blinded by your Observability tools
A reminder to be mindful of what youāre collecting; most shops can no longer afford to āmonitor all the thingsā.
Tools
āRunbook automation platform with deep observability integrations for SRE & On-Call Teamsā
Job Opportunities
Sr. Site Reliability Engineer at Vimeo (US Remote)
See you next week!
ā Jason (@obfuscurity) Monitoring Weekly Editor