IRP – Infrastructure Resource Planning: a series in the making…

While not earth shattering, I suspect that many people who are in the business of managing internet infrastructure have long sought to holistically manage this infrastructure and searched for a set of common best practices.  There is ITIL & ITSM which are really GREAT models to start from, yet seem a bit incomplete as per managing, specifically, interne infrastructure – built on the three pillars of Datacenters, Server (including storage), and Networks – as a “business”.  Then there is traditional ERP best practices, yet again, ERP is not exactly a perfect fit either as it is too broad – so why pound the square peg thru the round hole then?  Why not take the best of all of those three and call it Infrastructure Resource Planning (IRP)?

My own evolution has brought me to where I am in that very function – managing the business aspects of enterprise internet infrastructure – and as such, have begun to plan out the entire lifecycle of that infrastructure from procurement thru end of life/disposition and everything in between…including the very notion of shared services charge back models, which is why IRP is so analogous to ERP but not exactly the same thing.

ERP, as per wikipedia, is defined as the following:

Enterprise resource planning (ERP) systems integrate internal and external management information across an entire organization—embracing finance/accountingmanufacturing, sales and service,customer relationship management, etc. ERP systems automate this activity with an integrated software application. The purpose of ERP is to facilitate the flow of information between all business functions inside the boundaries of the organization and manage the connections to outside stakeholders.” – overall, this sounds a lot like evolving infrastructure management best practices inside enterprises, yet the twist is that while ERP is generalized to suit all businesses, IRP would emphasize the underlying interent infrastructure that a business runs upon today and manage it accordingly.

ITIL is a GREAT framework to follow for infrastructure mgmt that is predicated upon 30yrs of evolution and primarily focused on IT services, which are traditionally centered around the “desktop”.  “ITIL advocates that IT services must be aligned to the needs of the business and underpin the core business processes. It provides guidance to organizations on how to use IT as a tool to facilitate business change, transformation and growth.

The ITIL best practices are currently detailed within five core publications which provide a systematic and professional approach to the management of IT services, enabling organizations to deliver appropriate services and continually ensure they are meeting business goals and delivering benefits.”

However, ITIL doesn’t quite cover the infrastructure optimization of core datacenters, servers, and switches/routers specifically enough, as my own experience has yielded that a key component of IRP that differentiates itself from ITIL is that engineering research and development are requirements to optimize, holistically, the three pillars of internet infrastructure: Datacenters, Servers (including storage), and Networks.

Then, finally, there is ITSM.  ITSM is itself a process based framework much like ITIL, however, ITSM is not attributed to any one person or organization (ITIL is trademarked by the UK Cabinet office).  “ITSM is generally concerned with the “back office” or operational concerns of information technology management (sometimes known as operations architecture), and not with technology development.”  This notion of a management framework, again, is a GREAT start, however, the fact that it is not interested in tech development is where it falls short and IRP picks it up.

The idea of infrastructure R&D is the KEY differentiator of IRP form ITIL, ITSM, and ERP.  IRP is very much a large enterprise concept that does not disassociate itself from advancing the underlying technology; it embraces it as a way in which to drive ever more contribution to businesses’ bottom lines.

IRP is relevant today as businesses have scaled and evolved such that we are near a point where applications can be separated from direct relationships with internet infrastructure.  From the moment virtualization commenced, we have been trying to allow apps to live anywhere, anytime on a cohesively managed infrastructure.  When that infrastructure can be managed as a system onto itself is when IRP becomes relevant.

IRP is not a radical movement nor is it an earth shattering concept. It is a concept however, that is begging to be recognized as we further advance the separation of apps being directly tied to the underlying infrastructure. Then we can plan, manage, and tune that infrastructure as a system which is why technology advancement is a key component of IRP and where ITIL & ITSM fall just a bit short.  Thus, IRP as a concept is timely and needed to truly get a hold of your internet infrastructure from supply chain to management and monitoring to refresh cycles to research and development all thru the lens of optimizing the underlying business value proposition.

PS – stay tuned as there is an emerging metric which will be able to tie IRP all together – it will be made public in the next month.


From ERP to Infrastructure Resource Planning.

Ever since the Internet took off, those who have been managing the infrastructure have been laser focused on optimizing each and every part of the “stack” leading up to and stopping short of the application (another post on this one as that is about to change!!!).  In so doing this, the largest of players have tuned their datacenter energy consumption thru quantification via Power Utilization Effectiveness/Efficiency (PUE), have sought to push the HW vendors to tune the servers we deploy in the datacenters via efforts like Open Compute Project (just attended recent event), and the industry has more robust contribution to the ever evolving set(s) of industry mgmt standards that are common across MOST infrastructure teams (see ITILv3.0).

What is not, however, all that common is an over-arching view (process map) to tie this universe from end to end – DCIM adoption has been slow due to not only this lack of cohesive visibility but also to advance our abilities to even further tune the “infrastructure engine”, we need to begin to holistically look at infrastructure the way in which manufactures have long done via Enterprise Resource Planning – I posit, we need to adopt Infrastructure Resource Planning (IRP) as a way in which to advance the concepts of ITIL + ITSM + DCIM + (soon to be published, new metric!!).

The asymptotic race to the Power Utilization Efficiency (PUE) of 1.0

I think Mike Manos of DRT put it best – it is like an arms race to see who can achieve the lowest PUE. PUE is a metric used to evaluate today’s datacenter’s power efficiency by measuring a facilities total power consumption (X) and dividing it by the electricity of the “IT Load” (Y) or all the computing equipment. Some datacenters look at the power utilities meter for X and then either aggregate all UPS measurements (since they normally are not “networked”) or all the cabinet/rack PDU’s (both very manual processes). For you electrical engineers, this does not take into consideration any step downs from transmission rates or the multiple conversions to get to 240/120 inside the cab (nor the loss associated with server power supplies, etc).

The metric is typically expressed as some number that exists between 1.0 (theoretical nadir) and, the highest I’ve seen to date, 3.5. The EPA did a study and found, out of sample size of 100 datacenters, that the national annualized avg was 2.4 (I think this is very, very low) – this means for every Watt going to a server, 1.4 Watts is consumed by the facility (e.g., cooling, lights, monitoring, security, etc – anything not inside a computer rack). This metric is a far cry from the “Tier” system (Uptime recently bought by Group 451) as it starts to really hone in on what is important – isolating the “useful work” being done inside an “information plant” and trying to minimize all else…but it still has a way to go.

As evidenced today, we see people tossing PUE numbers around as if they are facts and not approximations (any PUE metric is a point in time – PUE is affected by too many things to remain fixed today). And the current thinking (folks like to look smart and toss around PUE as an indication of how much they know) is the lower the PUE, the better my datacenter…while true that a low PUE is a good thing (for above plus harmony with the environment and simply being more efficient), we are reaching a point where you can not go lower…unless you are generating power on site (co-gen) and then giving power back to grid (think hybrid cars), you will always have SOME power outside of your CPU/Mem/Storage that is required. Thus, what is that magic PUE nadir? 1.05? 1.1? 1.2?

This begins to sound a lot like the opposite race to achieve more and more reliability – 99.9% up; 99.99% up; 99.999% up; 99.9999% up – what we found was that somewhere around 4 or 5 nines was sufficient for our uptime as Murhpy proves to us, things will go wrong in this universe of entropy.

So while it is laudable to continually push the lower limits of PUE, we must not get caught up in it as the end all be all metric (for it is not) – it is a great evolution from Tiered systems (which you still see as part of RFPs today btw), but it still hasn’t been fully standardized upon for us to make apples to apples comparisons (one PUE is not like the other – all depends on assumptions at time of measurement). So as Mike Manos states, the datacenters’ “arms race” is to push us to “mutually assured efficiencies” which is far better for us all than the military version…:-)