MS SRE Workshop Notes Taken


what level of flows are healthy, storage account, 5ms healthy, analyse the failure, implement mitigations, put pods into 10 mins crashloop, what will happen, response time will increase.








get to the public website critical, understanding what is first step, e.g. website


large customers run web and backend traffic in different clusters, it can start from one cluster for small footprints, apim for AI, better observability, AKS egress node pool, NSG

login for different apps cpus,


no functional requirements, how do i know the good and bad, response time, architects capture the non functional requirements, product owners, infra and platform team.


design the health level as a flow level












john runs some chaos experiments

Chaos Mesh Overview | Chaos Mesh

install chaos studio pre-rep and then create chaso studio target and experiments


Comments

Popular posts from this blog

How to Build Secure Linux Server?

Understanding RAG (Retrieval-Augmented Generation) and Fine-Tuning in AI

A Developer’s Guide to Using GitHub Copilot