Startup founders have an important role to play in building out a robust SRE team. In order to build an SRE team that delivers on the promise of building a high-performing system, startup founders must have an intimate understanding of the work that is being done by their engineers.
This understanding is needed in order to ensure that engineers are working on the right things, and doing so in a way that will be sustainable over time.
SRE practitioners are skilled at building and maintaining high-performing systems, but this work does not fit neatly into the everyday tasks of software development.
The process of system design and implementation requires expertise from many different domains: operations, performance engineering, capacity planning, security, networking, database administration and more.
It also requires a level of commitment from both developers and operations staff that goes beyond what is typical for software development teams. As such, it can be difficult for startup founders to understand what exactly their SRE teams are doing day-to-day.
Startup founders must understand what their SRE teams are doing if they want to build high-performing systems that can scale as the company grows over time.
What does SRE stand for?
SRE stands for Site Reliability Engineering.
It is a new title that was created by Google in 2011 to describe the work that their SRE teams do. Google has been using the term “Site Reliability” for many years to describe the internal infrastructure that powers their products and services.
The concept of Site Reliability was created in 2005 by Eric Brewer, an engineering manager at Google, when he coined the term “10x engineer”.
The idea behind 10x engineers is that some engineers are 10 times more productive than others.
According to Brewer, these 10x engineers are able to produce systems that are 10 times more reliable than what is produced by other engineers who are not as productive.
The concept of 10x engineers led to a lot of questions from Google’s leadership about how they could produce more reliable systems and how they could develop more effective ways of training and evaluating their engineering staff.
It was through this process that the idea of SRE was born. SREs are highly skilled software engineers who have extensive experience in building high-performing systems, but they also have deep knowledge about how to build and maintain highly available services and applications on top of robust infrastructure platforms.
This combination of skills and knowledge allows SREs to design and implement highly available systems that can scale as the company grows.
Who are SREs?
SREs are software engineers who have extensive experience in building high-performing systems. The work of an SRE differs from the work of a typical software engineer in several ways.
First, the role of an SRE is centered around building and maintaining high-performing systems.
Second, this work is often performed on top of complex infrastructure platforms such as Google’s Borg and Omega, which were developed by Google’s Site Reliability Engineering team.
Third, the focus of an SRE is on making sure that a system will continue to operate reliably over time as it scales up to handle more traffic or more users.
Fourth, because they work with complex infrastructure platforms, they must be able to use sophisticated monitoring tools to detect problems and take action before those problems impact users or other services.
Finally, because they work with systems that must be highly available at all times, they must understand how failover mechanisms operate in order to make sure that systems remain online when failures occur.
SREs also have domain expertise in areas such as operations, performance engineering, capacity planning and security. These areas of expertise are not necessarily taught in the same way that computer science and software engineering are taught.
Instead, SREs learn these skills by working alongside operations staff who have expertise in these areas. This kind of mentorship is very important for building high-performing systems, because it is through this process that SREs learn how to build and maintain systems that can scale over time.
Do startups need SREs?
Startups do not need SREs in order to be successful. The startup founders who build these companies can hire developers who are skilled at building and maintaining high-performing systems.
However, there are several reasons why it is beneficial for startups to hire SREs to build and maintain their systems.
First, SREs are highly skilled software engineers who have extensive experience in building high-performing systems.
Second, they have deep knowledge about how to build and maintain highly available services and applications on top of robust infrastructure platforms.
Third, they are experts at designing systems that can scale as the company grows over time.
Finally, because they work with infrastructure platforms such as Google’s Borg and Omega, they understand how these platforms operate and can teach others how to use them effectively.
About the Author
I am the Founder of Cudy Technologies (www.cudy.co), a full-stack EdTech startup helping teachers and students teach and learn better. I am also a mentor and angel investor in other Startups of my other interests (Proptech, Fintech, HRtech, Ride-hailing, C2C marketplaces and SaaS). You can also find me on Cudy for early-stage Startup Founder mentorship and advice.
You can connect with me on Linkedin (https://www.linkedin.com/in/alexanderlhk) and let me know that you are a reader of my Medium posts in your invitation message.