Erik Hupjé | The Road to Reliability Framework Explained
A podcast with Erik Hupje, founder of Reliability Academy and a widely respected voice in the maintenance and reliability field
Intro & Summary
Welcome to our latest podcast episode with Erik Hupje, founder of Reliability Academy and a widely respected voice in the maintenance and reliability field. With over 50K followers on LinkedIn, Erik is known for consistently sharing high-value content with the industry.
The episode is divided into four key parts:
Introduction & Erik’s background
The Road to Reliability Framework™
Principles of an Effective Maintenance Program
Advice for New Reliability & Maintenance Managers
We’re confident you'll find value in Erik’s insights. Please share your feedback with us on LinkedIn, including suggestions for future guests you’d like to hear from—perhaps even yourself!
Episode link
Apple :
Takeaways & Favourite Parts
Eric Hupje emphasizes the importance of a structured approach to maintenance.
The Road to Reliability framework focuses on three key areas: planning, PM programs, and defect elimination.
Cultural aspects play a crucial role in sustaining maintenance improvements.
Data capture and analysis are essential for identifying and addressing maintenance issues.
Many PM programs are ineffective and need to be reviewed regularly.
Training and involving frontline workers can lead to better maintenance practices.
Understanding the characteristics of failure modes is vital for effective maintenance.
Success stories demonstrate the potential for significant improvements in productivity.
New reliability managers should focus on learning the fundamentals and building relationships.
A step-by-step approach to improvement is more effective than trying to achieve world-class status immediately.
Chapters
Chapters
00:00 Introduction and Background
07:38 Understanding Maintenance Challenges
13:27 Planning and Scheduling in Maintenance
19:10 Cultural Aspects of Maintenance
24:33 Continuous Improvement in Maintenance Practices
29:53 Capturing and Analyzing Downtime Data
35:07 Principles of Effective Maintenance Programs
46:38 Advice for New Maintenance Managers
Post Show Notes
In this episode, Erik frequently points out that many failure modes are not age-related, meaning traditional time-based maintenance isn’t ideal in many cases where condition-based approaches can perform better.
If you’re interested in exploring Condition Monitoring and seeing how sites have made simple, affordable transitions to Predictive Maintenance, we’d be glad to help.
At Factory AI, we’ve helped sites across Australia upgrade their maintenance strategies with our predictive maintenance solution. We work closely with customers to develop software that’s accurate, user-friendly, and cost-effective.
To learn more, feel free to book a session with me or explore our blog, where we share insights on reliability, maintenance, and predictive strategies.
Full Episode Transcript
Jean-Philippe PICARD (02:02)
Eric, welcome to the show.
Erik Hupje (02:04)
Thanks, Jean-Philippe, great to be here.
Jean-Philippe PICARD (02:05)
The pleasure is mine. I have been a fan of yours since day one of being in this game. And it's great to have this opportunity to speak to you and ask you my questions, having read so many of your content, of course.
Welcome to the show. And for those who don't know Eric, well, I don't know how this would happen because if you've been in this game and you are on socials, you would have heard either directly or indirectly of Eric's work. You probably would have read some of his posts, which are all very high quality. And that's why I'm keen to have you here. But I have myself heard your story before. I think it's a story that you shared through some of your blogs. Often you get snippets here and there, but I've never had the pleasure to hear the whole thing.
and what got you to the place that you are today. So I'd love it if you could start and tell us, yeah, tell us a bit about your career and what you've done today.
Erik Hupje (03:01)
Sure, yeah, sure. So I started as a young engineer. I went to university in the Netherlands, Delft. I'm originally from the Netherlands, which explains my accent. So I went to, did an engineering degree there. I joined Shell and I worked for them for a long time, worked overseas, initially in the Netherlands for a few months, then in the UK for five years, worked seven years in the Philippines, almost five years in the Middle East. And then in 2013, I moved to Australia and...
After about six years, cut a long story short, after about six years, was time to pack up our bags again as a family. And I said, no, you know what? We're staying here. It's about time we actually settled down as a family. we stayed here. I joined another company. I worked a year and a half for Chronicle Phillips for their LNG plant in Queensland. And in parallel, I developed my own business. I always wanted, well, not always, but I kind of eventually developed this idea of stepping out of the corporate world and building my own business, something that would be.
more in line with what I enjoy and more in line with the way I wanted to fill my life. And I started basically what at the time was called the Road to Reliability blog, really just putting out there my content and thinking through how I could turn that into a solution or a service. So basically over the years, I developed a very simple framework. That's what we now call the Road to Reliability framework, where we focus on those three, four core processes and developed, or am still developing.
online training with that. So I launched the first online training course in 2020, so it was four years ago. And then while I was still working, so that got to the point that it was crazy hours. I was, you in the office from seven to five, seven to six, and then I would come home, dinner with the family and I would work till late on my own business. So got impossible. so yeah, I quit my corporate job basically because I felt like, I can always get another job, but I probably don't get another chance to start my own company. So.
Erik Hupje (04:48)
That was back in 2020, so now four years later, I'm still here. We're still growing. We're growing steadily, actually. We're doing well. And it's been a fantastic ride. I really enjoyed it.
Jean-Philippe PICARD (04:55)
Well, congratulations. Congratulations on doing the jump and the leap. It is something a lot of people talk about and few do, because no doubt requires a lot of courage and a lot of hard work as you share in your story. So yeah, well done on that. And importantly, well done for having been to date successful, because as you say, you've grown and we can tell indirectly by you putting out content that is very valuable.
Jean-Philippe PICARD (05:21)
So yeah, well done on that. Why did you decide to settle in beautiful Australia and in specifically in Queensland of all the other places you've been?
Erik Hupje (05:28)
Well, as I mentioned, I'm originally from the Netherlands. My wife is originally from Russia. After we met, we often had the conversation, okay, well, it's great to travel around, but eventually we will need to settle down somewhere for the kids to give them a home. And it was like, okay, it needs to be English speaking because we speak English at home. So that narrows it down fairly quickly. And yeah, I won't bore you with all the analysis, but we basically came to the conclusion that Australia was a good combination of good climate.
Erik Hupje (05:55)
We like the society, we like the Australian people, we like the fact that it kind of sits in between, you know, the socialism you see in Northern Europe and the kind of pretty strong capitalism you see in North America. And I feel Australia sits kind of nicely in between there. Picking or trying to pick the best parts of both doesn't always succeed, succeed, of course. But yeah, we felt it was a great, great country to live, a great country to raise a family. And that's, that's why we came here and that's why we stayed here.
Jean-Philippe PICARD (06:20)
Nice, nice. We could see hints of your personality there. It seems that you were quite methodical and rational about the choosing here. I wouldn't be surprised if you showed me an Excel spreadsheet with 25 rows and scoring system associated to it. That's cool. Yeah, I went through a similar journey myself. Yeah, it's a beautiful moment when you realize that you're in a place that you love and you'll get the chance hopefully to stay there.
Jean-Philippe PICARD (06:47)
That's great. so, well, let's get into the meat of it, right? when you accepted the invite, which was a great moment for me, as I said, I knew there's so many places we could bring this conversation, especially because you're incredibly prolific. You post tons of stuff. And in recent weeks, it feels to me that you have posted tons of really quality content. So thought, know, where we could talk about so many things. Where do we start?
I thought, you know, let's start with the beginning. Beginning is actually something you commented on in your intro there. you've, I think one of your first products is the road to reliability framework itself. And to me, and correct me if this is wrong, but to me, feels like a lot of people that would get exposed to you and want to work with you, they'd probably start there. So yeah, let's start there actually. And without me asking anymore, can you tell us about what this road to reliability framework is?
Erik Hupje (07:22)
Yeah, sure. It's a very simple framework. So the whole idea is that basically when you focus on the core processes and the elements that are in that framework, it's only four elements, you can really make a dramatic improvement in your maintenance or reliability performance in your company, in your asset, organization. So I worked for a long time for large companies like Shell and ConocoPhillips across the globe and lots of different assets. And pretty much everywhere I worked, where I came,
We were always struggling initially with the same issues. We'd have relatively low maintenance productivity. We'd have a lot of PMs that we were doing that didn't add value. We missing important PMs that were leading to preventable failures, often fixing things over and over and over rather than just eliminating the root cause. And all in all, just very, very reactive. And throughout my career, I kind of learned that if you fix those three things, if you get productive, you have a good PM program, and you tackle your repeat failures, you make a really good
improvement and then everything else that's talked about on social media, all the fancy stuff can happen after you've got that in place. So that's something I learned over the years throughout my career and I basically felt that that's what most organizations need to tackle. And when I look on social, when I look on the web on what people advertise as frameworks for improving reliability, they're often very, very complex with 20 or 30 different elements. And I was exposed to that when one of my first jobs in the Philippines.
You know, and the problem is those frameworks are technically correct, you know, because they aim for world-class performance, but that's not the problem for most companies. They're not trying to get to world-class yet. They're struggling to make a beginning and get out of that reactive cycle. Right. So when you've got a framework with 25 or 30 different elements that you need to tackle, you go like, how the hell am going to get that done? You know, I just don't have the time and the people to do that. So the whole idea of my framework is really aimed at the majority of assets that are still very reactive and guide them on, you know, those three things, you know, improve.
Erik Hupje (09:24)
productivity through better planning and scheduling. Get a good effective PM program in place. Tackle your repeat failures through root cause analysis and defect elimination. And then underpin that with some good leadership practices and building a focus on culture. Because what I like to say is leadership starts to drive the change and is therefore essential.
Erik Hupje (09:44)
But you need to improve your culture and build a culture of reliability so that you can actually sustain the change that you make. And that's another thing that a lot of companies get wronged. They might push on something as an initiative. Then they turn it back and do something else, and it just all unravels. So it's also really important that as you implement these things that you keep in mind that, are we going to make sure this is sustained? And that's where culture and training and a lot of other things come in place. So yeah, that's the framework. It's very simple. Now, the beautiful thing is also I learned
Erik Hupje (10:13)
through my career, the focus on those three things, by doing, by failing, by learning. But there's actually research that underpins those three elements. And it's research that has been talked about, but it's not really talked about anywhere near enough. It was research that was done by DuPont and led by, I believe, Winston Lede, the developer of the manufacturing game. And that still is in play.
But the research that underpinned it basically showed that the manufacturing companies, and they analyzed 3,500 manufacturing assets in a big benchmarking study. And they found that the ones that were the most productive, that had the most uptime basically got those three things right. They had planning and scheduling in place, they had a good PM program in place, and they did defect elimination. And in doing so, they typically had 15 % more uptime than their reactive counterparts who weren't effective in those three areas.
Erik Hupje (11:05)
So it's not just my personal experience. There's actually good research behind it why you need to focus on those things. And to be honest, when I speak to clients or potential clients and we talk through the framework, they're like, yeah, it makes a lot of sense. And that's another reason why it typically resonates well, because, it does make a lot of sense. It is easy in terms of framework. It's not necessarily easy to get it all in place. But at least you've got something that you can understand. you're like, OK, well, those are three things we can actually get in place rather than having to look at 25 different elements.
Erik Hupje (11:34)
And go like, how am going to do that? So yeah, that's the whole idea of the roto-lap-a-li framework.
Jean-Philippe PICARD (11:38)
Okay, that's a great Genesis to how it came about. Totally in agreement with you on like the number of steps. I tend to think with frameworks, if you can't remember all of them in your head, then it's not a framework. it's like, it's past the... Yeah, so you can easily remember four things, five things is probably starting to be maximum of what... Yeah, for me too, I'm a simple guy. Yeah, okay.
Erik Hupje (11:53)
Yeah, fair point. Certainly for me; I like simple.
Jean-Philippe PICARD (12:06)
Well, yes, and that shows through your work and it takes a lot of experience to get to the point of being able to simplify something well. So I think it'd be good for us to get into nitty-gritty a little bit more here. As you talk about, there is research that backs it up. I think even if we go there, there's already some surprising elements into this research. So the way I've understood it is that...
a lot of plants, maybe they have an average of about 80 % uptime. And if they implement your framework, they can get to higher 80s and even 90s, which is fantastic. But the way that they get there through these steps is actually quite surprising. So according to this research, planning and scheduling will get you maybe around 2%. And then implementing a PM plan might get you
even backwards because you're now having to do these activities, but it'll add up to about a similar percentage point. But a large part of it actually comes from the defect elimination, which according to that research was upwards of 14%. That's really surprising. Having read through why then it makes sense. But the question I have here is like, so if these are percentage points that we agree on, that's one thing and I'd like your comment on that. But secondly,
If that's true, then why is so much of the focus first on planning and scheduling if the lion's share of the benefits tend to come from defect elimination? So it'd be good to your comment on
Erik Hupje (13:41)
Yeah, just a bit on the numbers. The research from DuPont basically showed that if you did, and I forgot the exact numbers, but we've got some charts that we can share them. And they'll be on the web. But basically, they looked at planning and scheduling separately. I think both, scheduling was 0.8 % uptime increase, planning on its own 0.5. And then preventive maintenance on its own was minus 2.4 because
in a manufacturing environment, they were shutting down machines more often to do the PMs. But when they did planning and scheduling and preventive maintenance together as a plant and got that effective, they saw a 5 % increase in uptime. And then it was an almost 10 % increase in uptime after they implemented defective alienation. Those were the rough numbers. So yeah, the lion's share of the gain comes from defective alienation. And this is why indeed some people say you've got to do that first. My experience is that
Couple of things. One thing I see with a lot of organizations that we start working with or have a conversation with, they are very, very active. They have a lot of break-in work to their frozen weekly schedule, if they even have a frozen weekly schedule. But if they do, then what typically happens is they have a lot of this break-in work that comes in that is poorly prioritized typically. A lot of it actually isn't really as urgent as they think it is, make it out to be, and therefore it displaces all this work that has been properly planned, properly scheduled.
Erik Hupje (15:05)
really impacts their productivity. So my view is that in the majority of plans, what you want to do is you want to really improve your working environment and create a more stable working environment where you have a weekly schedule that runs reasonably well. It won't be perfect. You're not going to get high 90 % schedule compliance. And if you do, to be honest, you're probably not putting enough work in your schedule. That's always one thing when people say, we've got very high schedule compliance. Great. Put more work in it. Yeah. But yeah, you want to really build a stable working environment.
Erik Hupje (15:33)
Because that stable working environment starts to reduce the stress in the teams, it gives them more time to think about, OK, well, what improvements can we make in a lot of other things? There are times when that doesn't work. I was recently at a client site. They asked me to come and do an assessment. And my mental model is very much where we start with planning and scheduling. I did a review, and we looked at a lot of the work that was breaking in to their frozen weekly schedule. And unfortunately, the vast majority of it was justified.
Erik Hupje (16:00)
They just had a lot of ongoing defects. So in that case, my recommendation actually was, you've got to basically separate out a small crew. And I literally said, I hate making these recommendations, but I'm still going to make it. They had a small breakdown crew that over time had become less and less focused on breakdown, more and more scheduled work to try and get productivity up. But the result was that now those breakdowns were impacting their whole maintenance team. So the recommendation for them was actually,
Erik Hupje (16:28)
Right, reinstate that breakdown crew for a year or two. They can then deal with most of the upsets in the maintenance process, get your main crew to focus on building a disciplined maintenance schedule, execute that, get used to building a schedule and executing that schedule, and then in parallel, eliminate what is causing those break-ins. Because they didn't have a very effective root cause analysis process, but even if they did, those processes typically look at
the things that cause you the most production downtime, which makes sense, of course, right? But they don't look at all the things that are disturbing your weekly schedule, all the small things that keep happening every single week, right? And if you don't fix those, you'll never get to a productive working environment. So in that case, the recommendation was, you know, have that breakdown crew to protect the rest of your crew, and so they can work proactively, and then tackle all those break-ins. Everything that comes in, analyze it, parietal them, eliminate them. And after one or two years, you will be in a stage where...
Erik Hupje (17:24)
You know, you can become more productive. The other thing that's really important that people I think often forget, good world's, sorry, good root cause analysis and defective nation. That can only happen if you are in an environment where the culture is proactive, where there is no blame culture, where there is a focus on addressing systemic issues where people want to learn and they have the time to investigate in that and those kinds of things. That typically
Erik Hupje (17:51)
happens once you've established a stable working environment. If you're very reactive, that's very hard. And most reactive cultures tend to slip into blame culture. And they slip into blame culture. They celebrate the overtime heroes and all those kind of classic things that come with it. so putting in place an effective defect elimination program in an environment like that is very hard. Yeah, you can do some of it, but it typically won't get sustained. So the normal recommendation I have with most plants is:
Do the planning and scheduling first so you can create a reasonable level of stability. Then tackle your PMs and your defects. Normally, I recommend your PMs first because you just want to make sure you're not wasting time doing work that doesn't need doing. And when I say do your PMs first, it doesn't mean you analyze every single PM and every single machine from A to Z. You tackle your pain points. Now, where are you spending most of your time? Is it worth it? No, well, get rid of it.
Erik Hupje (18:45)
Where are you seeing breakdowns that you should be able to prevent with your PMs? Fix that. So you focus on a number of systems that are your bad actors, get that addressed, and then really move to root cause analysis and defect elimination. So there is a logical sequence, but a logical sequence doesn't hold for every scenario. So you do have to understand what you're doing and being able to say, well, yeah, that is the logical way, but this time we're going to have to do it slightly different for these reasons.
Jean-Philippe PICARD (18:46)
Yeah. Yeah. And that speaks to the value of like dealing directly with you or maybe with another expert that knows what they're talking about. Because if you just read the white paper and the white paper said, that's where you start. there are, there are instances where that's not exactly the right thing to do. I do appreciate the example you gave as well, because one thing I was thinking, I haven't done a fantastic job at this just yet, but a lot of our listeners are reliability and maintenance managers and
as tangible as we can make it for them is best. I know maybe this is just for me, but when I speak with our customers about numbers and findings and they sometimes gloss their eyes and say, know, all well and good, but what does that mean for me? And I think one of the places you were going at the end, it could actually be quite helpful here if you have some specific examples of this, but you, I think I read this from you, you say that,
Anywhere from 40 % to 60 % of PMs had little value. Hopefully I'm not putting words in your mouth here, but I think I read this in one of your posts or maybe it's even in that paper. And I think our experience would support this, that our customers tell us this, that they have a lot of PMs that they don't think should even be done. And that creates a lot of problems. Sometimes they just don't actually end up getting done. Sometimes they stay on that list and get done. But yeah, can you tell us if that's true, if that's your experience, how does one side or team start? to deal with this.
Erik Hupje (20:34)
Right, OK. So that's my experience too. The numbers, that range of 40 to 60%, actually comes from John Mowbray's book, RCM2. So he quotes it there in his book. But to be honest, I've seen similar percentages in my own experience as a maintenance manager improving PM programs. I see it with my clients now. I hear it from a lot of other practitioners. I think we all know that a lot of PM programs are just
Erik Hupje (20:57)
I was going to use another word, but let's keep it polite. They're not very good. They're very ineffective, very inefficient. And typically what happens is PM programs are very often developed as an afterthought. They're not proactively developed during a project phase of a new plant through a rigorous process of, combining RCM and FMEA and everything else. That's how it be done during the design phase of your plant and everything else, and then you start up with all that in place. That's typically not what happens.
Erik Hupje (21:23)
And typically what does happen is a plan to start it up. They barely have the PMs in place. They'll look in the OEM manuals, they'll stick that in and then off they go. Things start to break because the PM program isn't great. So they start putting PMs in to replace things before they break. All right. They have safety incidents. They put PMs in to do inspections to avoid safety incidents. And over time you end up with a PM program with all those things in place. But a lot of safety incidents, you know, I've seen in the past
A lot of times, PMs go in that actually aren't going to be effective because the failure mode that you're dealing with or the issue you're dealing with is not something you can effectively address through those maintenance tasks. So for example, I've had it in my experience where we had a light fitting fall in a warehouse. Light fitting was badly installed, bad design, a couple of things. Had nothing to do with maintenance. because of a safety critical or not safety critical, because it was deemed safety, management said,
Erik Hupje (22:18)
we will put an inspection in place to make sure this doesn't happen again. And before I know it, you have an inspection like this every three months, every six months. But you're wasting your time because those inspections are never really going to deal with the failure mode, because the failure mode is bad installation and bad design. You need to basically change it, and then it's done. So there's plenty more examples like those. The same with a lot of the OEM tasks that go in are often time-based replacements or time-based servicing. But most of our failure modes in
Erik Hupje (22:47)
in our factories when it comes to failure modes, they are not age-related. So they don't necessarily increase over time when a piece of equipment gets older. So if that's the case, then replacing it at a certain time is actually not going to increase reliability. So you're wasting your time. What you should be doing is some kind of condition assessment. Say, hey, how's the condition of that equipment? Is it deteriorating? Do we need to intervene? Yes or no? So that's how lot of these PM programs end up like that. And so to fix it,
Erik Hupje (23:15)
So a couple of things you need to do. Step one is you need to get people in your team to understand those principles. That's another thing I see. There's a big gap. In a lot of organizations, the PM programs are developed by disciplined engineers. Most disciplined engineers have learned very little around reliability or reliability-centered maintenance, have very little experience in developing effective and efficient PM programs. They do their best, but they don't really understand a lot of things or know a lot of these things.
So you end up with flawed programs. So you need to train your team. You need to make sure you involve the front line. Because the front line, they're not stupid. Your technicians out there, they do these tasks and say, well, what a waste of time. I'm doing this every month, and it doesn't add any value. The risk with that also is it creates a potentially bad culture. Because if you ask your technicians to do non-value adding work every single month, every single quarter, whatever, what are they going to do?
they're disciplined, they'll do the task, knowing that it's a waste of time. And then they'll sign it off in the CMS. But if you're in a reactive environment, or you're under pressure, and you have a lot of work to do, what's going to happen? Well, the guys are going go, well, this is a stupid task. It doesn't work. They're going to sign it off, not do it. So by doing that and by allowing that to happen, you create a culture where the taken flick becomes normalized. The risk is that once that becomes normalized and you improve your PMs, now they're at risk. Your new PMs are also at risk of just
Erik Hupje (24:33)
suffering from that tick and flake, right? So that's another issue that happens underneath that with your culture around this. But yeah, you need to train your people. You need to involve your front line, because they know what's happening out in their plant. And then it's a matter of saying, OK, let's do a simple analysis. Where are we spending most of our man hours? Where are we spending most of our money? Where are we incurring most of our failures? And if those failures are preventable through PMs, you can prioritize which systems to focus on and start doing a simple
Erik Hupje (25:00)
review of your PMs. there's different methodologies you can do. I'm sure we'll touch on that. There's RCM. There's PMO. And there's FMBA. But it all comes down to just really understanding, how can this equipment fail? What's the consequence? What am I going to do about it? So understand those things. And you make a good step forward. And you don't need software. You can do this on a spreadsheet. You just need to really understand the equipment and have some good knowledge of
Erik Hupje (25:27)
how that works. So break the equipment down, understand the failure modes. What are the causes behind the failure modes? What are the consequences? What are the characteristics of the failure modes? And therefore, what is a suitable maintenance task to do? Because that's another thing we often see is that people do maintenance tasks that don't line up with the failure mode. So we just talked about that. If you're doing a time-based task for a failure mode that's not age-related, you're wasting your time, you're wasting your money, you're wasting your resources. So yeah, that's typically what's happened. It's not difficult to fix.
Erik Hupje (25:56)
does take effort, usually becomes a problem because people actually, they're not really trained or coached in this area. So it's still hard then. And people tend to rely way too much on software for these things, when really it's about the thinking process. So there's a really good phrase in Moe Bray's book, R.C.M. 2, where he says, R.C.M. is thoughtware, not software. And I love it because that's exactly true. You can do a really good R.C.M. analysis on a whiteboard or on a spreadsheet.
And it will be way better than having people who don't know what they're doing using the latest software. So yeah.
Jean-Philippe PICARD (26:33)
Yeah, no, that's good. I like that you gave that context because you and I had a few exchanges about, people over anchoring on software? Are they looking at the wrong? Is it shiny object syndrome? Yeah, and then that's perfect context for it.
Now, super interesting. You mentioned a lot of stuff that was super relevant. What cadence do you recommend people do this? Like, is this a you do it like once and then you're good for like two years, like you said with the breakdown team, or is this like it's a quarterly thing because this can easily and quickly become out of control?
Erik Hupje (27:05)
So it should be a weekly cadence almost, in the sense that you need to continuously look at, you need to look continuously, OK, where am I spending my time and effort? Maybe not weekly, monthly. But you should be looking, OK, where am I spending my preventive maintenance man hours? Where am I doing the most tasks? Where am I seeing the most failures? Where am I seeing the most downtime and everything else? And then looking at that and say, hold on. We're spending all this time on replacing x.
Erik Hupje (27:29)
Is that really a wise use of our time and resources and everything else? And you can go have a look and say, well, maybe not. So when I used to work for some of the bigger companies, they had these frameworks where you OK, you've got to do a full review every five years. I hate that. Because first of all, a full review is like eating an elephant. It's a nightmare. It's a lot of work. And secondly is you run the risk that if you do a full review, but it's only 20 % that really is hurting you. So why review the other 80 %?
Erik Hupje (27:57)
Right? Now, you may have a task that is totally ineffective and an absolute waste of your time. But if you're only doing it every two years and it costs you two hours, who cares at this point, unless there's other risks associated with it. But you don't want to focus there. You want to focus on what's hurting you in terms of, like I said earlier, what is using your resources, money, time, what is causing you failures, what is causing you downtime, all those kind of things. What is breaking into your schedule? That's where you want to focus. And you want to basically have somebody in your team who has the
Erik Hupje (28:25)
the time and the headspace to do that and look proactively and say, hey, did you realize that X piece of kit is breaking into our schedule every two weeks? Yes, it's not causing any production downtime, but every two weeks we have to throw away the schedule because this happens or one of these things fails and we need to go and fix it. How about we solve that? That's the mindset you want to develop.
Jean-Philippe PICARD (28:40)
Yeah, no, it makes tons of sense. you kind of touched on this, maybe this some people will find this too obvious, but I feel like it's worth exploring even just a little bit. I feel like some sites have this problem where a lot of this stuff happens and it doesn't get captured anyway. A lot of this work is like, goes behind the scenes and it's...
We're not too sure what happened two weeks ago. The communication between the night crew and the day crew and the contractors and all that stuff. So in the example that you gave, which is like, I don't argue with you, that's exactly the right way to do it. But some of this knowledge will exist in different people's heads because a lot of these people are not too good at using a computerized system. What do we do then?
Erik Hupje (29:29)
Train them. So fundamentally, and this is another reason why it makes sense to do planning and scheduling first, because one of the key steps of your planning and scheduling process is the closeout of your work, making sure that you capture the materials used, the time used, what was done, what was found, and everything else. That data you capture there is the data you use for your reliability analysis, for identifying your bad actors, and everything else.
Erik Hupje (29:54)
If you do that badly or you don't do that, well, then you've got nothing to analyze with. So that's a really important thing. You capture that data in the closeout process. The other thing I see with a lot of people is they go like, it's so reactive with all this. Then the next question, said, OK, great. So have you got, are you tracking your downtime? Do you allocate at least some high level buckets? No, not really. So that's the next thing. Make sure you've got a closeout process working. But the other thing is every time you lose production, capture it.
Erik Hupje (30:21)
Right? Start with a simple spreadsheet. Say, hey, we've lost two hours of production. Why? Well, for this reason or this apparent reason. Or even if you say, well, it was this equipment that shut down for two hours. Start capturing it in a spreadsheet. Very simple. One of the easiest things we did years ago was just we got the control room operator to basically say, right, here's a critical monitor. Every time one of those things trips or is down, you write down the time it went down and the time it got back up and the cause.
Erik Hupje (30:48)
Was it scheduled maintenance? Was it an unscheduled maintenance, as in a break-in or a breakdown? Or was it just a trip and you reset it or whatever? Do that for a few months, and you suddenly have heaps of data that you can use to analyze. Then after that, of course, a lot of plants have advanced control systems nowadays, so you should be able to pull that data out of there, because you probably know in your control system when that kit is on or off. And you can capture all that. All you need to do is classify it down.
Erik Hupje (31:16)
So that's step one, you need to have the data. If you don't have the data, it's hard to do any kind of analysis, And the risk then is also you start doing, you you get into people's opinions and everybody has an opinion and that's all fine. But you can't really solve those problems on opinions. That really works. So you really want to capture the data. And then the next thing is once you do that, is like you said earlier, it's really important that you document it, right?
You document your analysis, you document your findings, and you document your decisions on why you make those. Whether you're doing an RCA or whether you're improving your PM, you need to document that. I recently spoke to a client who had no real process for documenting their PMs. And I said, well, it's something you need to fix. And one thing that came out of the conversation was that they didn't have that process. They didn't have documents for their maintenance strategies.
And they even said, yeah, we recently updated some PMs. And we didn't have enough resources in-house to do that. So we brought one of our old maintenance planners back. And he looked at it says, don't do that. We tried that. That didn't work. It caused all kinds of problems. And he said, well, that's exactly why you want to document things, right? So if you don't document all this corporate knowledge, or this knowledge is lost, right? Because the next person who then starts to look at the PMs doesn't know that you tried this two years ago, and it didn't work because of these reasons.
Erik Hupje (32:34)
So this is why it's so important to document all these things so that you can build on it because, yeah, people don't stay in their jobs forever. They move on, they get promoted, especially good people, right? Good people do a really good job at improving whatever it is. Well, they get moved somewhere else, they get promoted and everything else. And if you don't make sure they document what they've done and why they've done it like that, then the next person who comes in has to start from scratch rather than build on it. And potentially they undo all the good work that has been done.
Jean-Philippe PICARD (32:41)
Yeah, yeah, and we're seeing a lot of like turnover in the industry now as well where sometimes it's just so fast and like, yeah, I feel like there must be hundreds of examples of people that starting all over where they shouldn't have. You gave an example there of like Excel for, I think that was like for a downtime capture, but when it comes to capturing this information now, like what are your preferred systems typically?
Erik Hupje (33:04)
Mm.
look, your downtime, there's lots of solutions out there. So I'm not going to name names. But really what you want to do is you want to have a solution where your downtime is captured automatically by your control system. So when a piece of equipment goes down or is whatever not available, that should be logged by your control system. Then all you need to do is basically, and this is where the human typically still needs to come in, is you need to classify that downtime event. Was it planned? Was it unplanned?
Erik Hupje (33:53)
Was it just a trip or whatever, right? So you want to classify that. And then that needs to go into some kind of reporting layer where you can basically pull simple Pareto charts. And the key thing there also is to be able to look at it not just by equipment, but also by equipment type. Because a lot of times you see, you've got the classic bad actors, your expert pump, or your compressor, or this or that. But in the meantime, you may have 200 Fin fans.
Erik Hupje (34:20)
or 100 fin fans, and every week one of them is failing. It never kind of registers because it's just one fin fan. But when you suddenly start looking for object type or by equipment type, you go like, holy heck, how did we miss that? We've got so many fin fan failures, or we've got so many instrument failures. So that's another important thing is you want to be able to look at this data by equipment type because you make it like, well, we've got a big problem with this category. We need to fix that. Yeah.
Jean-Philippe PICARD (34:25)
Yeah. Okay. Now that makes sense. Right. So we've done a good job, I think, thanks to you so far at sticking to our agenda. And we've talked about the road to a lively framework. So you now know there are four awesome steps to this framework. And as you said, we'll link this in the show notes if you want to read your white paper or talk to you directly. We talked about some of the principles of an effective maintenance program.
Jean-Philippe PICARD (35:08)
Do you want to add more there actually about the principles of an effective maintenance program?
Erik Hupje (35:12)
I think, you know, we've got an article where we talk about, where I talk about that and there's nine principles and everything else, but typically there's probably two things that are most important, I think, and the two things that people struggle with the most. The first is, I've said it before and I'll say it again, is that the realization that the majority of failure modes we deal with are not age-related, right? So that means that during the life of the equipment, those failure modes,
Jean-Philippe PICARD (35:22)
Hmm.
Erik Hupje (35:34)
they don't become more prominent. The likelihood of failure doesn't increase even though the equipment is aging. It's very counterintuitive for many, and for me too when I first kind of came across it many years ago. But that is really, really a fundamental concept that people need to get their head around basically. And there's plenty of good articles that explain why that is and the studies that were done with it.
Erik Hupje (35:57)
The studies originated in the airline industry, it's maybe worth sharing for a couple of minutes. They originated in the airline industry. So you often get people say, yeah, but we don't fly airplanes. No, you don't. But yeah, yeah. But we run mobile equipment in an open-cut mine. Yeah, it's a bit different than an airline, Boeing 747 or whatever. But the studies that were done that underpin RCM were done at component level. And they really relate to
Erik Hupje (36:25)
in a large degree, to go back to some physical principles in a way to underpin this stuff. But those studies were done in the airline industry. They were later repeated by the Navy in the US, the US Navy, and they were also repeated by the
submarine program in the US. Now, they typically haven't been done in many private sectors because there are complicated studies to do. They're expensive, they take a lot of time and everything else. But across those three sectors, very similar kind of findings came out. So my view is always, yeah, you're not an airline and you're not operating airplanes. But if you ignore this, it will be at your own peril because across industries, these things keep coming back to majority failure modes.
are not age-related. Yeah, the percentages differ. If you look at, say, nuclear submarines, the profiles and the different percentages that they found were different from the airlines. But the key message remained the same. Majority of failure modes are not age-related. And we need to understand it, because what that means is majority of our failure modes are not age-related. So the majority of failure modes we cannot effectively manage with a time-based replacement or a time-based service. It needs to be some type of condition assessment.
Erik Hupje (37:28)
So that has a major implication for how we build our PM programs, because the vast majority PMs are every two months or every three months replaced as serviced as that. But if it's not a time-based failure mode, then you're wasting your time. So that's the first key thing that people really, really need to understand. And then the next one is where I think a lot of people go wrong is they don't match the characteristics of the failure mode with the maintenance type they're selecting. So you need to understand.
Erik Hupje (37:54)
If the failure mode, if it's hidden, well, then you've got to do failure finding maintenance, right? You need to go function tested. If it's age related, you do time-based maintenance. If it's not age related, you do condition-based maintenance. You need to be very structured in your thinking. And that means also you need to understand these things, these characteristics of your failure mode. And very often, people jump over that. So that's really important. If you do that, then you're halfway there with building a good and effective PM program. So yeah, there's many more.
Erik Hupje (38:23)
But those are probably, in my mind, the two most important things.
Jean-Philippe PICARD (38:26)
Yeah, and they're big things, right? Yeah, that's awesome. So next area that we typically like to cover with NES is like just talk about stories. I thought with you, maybe this is a good area to for you like pitch your pitch to value that you bring. And if you wanted to, because you I think in your in some place in your papers, I've seen that you're you're trying to get clients to reduce their equipment downtime by 90%.
You know, that's a bold claim and you probably wrote that because you've achieved that in numerous places. So if you want to, can you talk about stories of success stories that you've had of working with clients and then been able to really make an impact and hopefully, you know, maybe even possibly get to that figure that you talked about.
Erik Hupje (39:11)
Yeah, so I think the let me kind of put a caveat on that a little bit. The 90 because it's it's I don't want to come across as crass or selling or anything like that. And I'm not actually going to be pitching my services here that I don't think people are listening to this podcast are particularly interested in that. So the 90 percent reduction in in downtime, it sounds huge and it is huge. Right. But really, you know, if you move from the mid 80s to the mid 90s, that's kind of what you're looking at. Right. So it's it's
Erik Hupje (39:39)
It depends on how you put those numbers. So I think the key thing is, what I would really like people to understand is that making that journey from a reactive plant where you've got a mid-80s kind of performance to a much more proactive plant where you've got a high 90 % uptime isn't that complicated. Yes, it does take time. You cannot do it overnight. It's just not possible, right? It takes time. It's going to take you a couple of years to make that journey. But it is not as hard and as complicated as people need to make it out to be.
Erik Hupje (40:08)
And we've seen that. One of our earlier success clients was a company in a very kind of different industry. And that's why I like highlighting it, because they were a client that operates trucks and trailers in Europe. And they operate heavy trucks and trailers, a very large fleet across Europe. They have like 100 workshops in Europe. And that was one of my first big clients.
Erik Hupje (40:35)
engaged us, they took our training, they rolled it out across workshops in France and Spain and Italy. They had to tweak it a little bit because it was a little bit different, right? So they tweaked some of the terminology and some of the process steps for planning and scheduling, but they really stuck to the core principles. And within about 12 or 18 months, they had a massive increase in productivity. for a company which...
where margins are really, really small, they saw a 16 % increase in productivity across all their workshops. And that was a seven figure benefit to their bottom line for those workshops, which was really, really big for them. And it's just because you're implementing a good sound process that's underpinned by principles that I didn't invent. They've been around for decades. Everything I teach, I guide people on, I coach people on, you can all find it in books like
the book from Obrey on RCM, books by Doc Palmer on planning and scheduling. It's all out there in the public domain. But unfortunately, a of these good things are kind of drowned out by lot of the noise in industry. So go back to principles, focus on those, and you will get those kind of results for sure.
Jean-Philippe PICARD (41:42)
That's a great story. then, yeah, presumably where margins are thin and they often are in like our industries, right? Like that's where any movement of the lever makes the biggest impact. If you already have fantastic margins, then, you know, a little bit of a...
Jean-Philippe PICARD (41:59)
an efficiency game doesn't lead to that much efficiency at the bottom. So yeah, that's great. That's awesome. I imagine you have a lot of stories as well where you've observed things that didn't go quite right, not because of your doing, but because of other factors as well. So if you have some of those that this is one and I learned this, feel free to share that too.
Erik Hupje (42:22)
Yeah, so there's probably an interesting story. It's a story that is from my own direct experience of what happens when you're in a reactive domain and how in the reactive domain, you end up making the decisions for the right reasons that are crazy expensive, but they're still justified because of the exposure you have. So back in the Philippines, I was...
Erik Hupje (42:48)
I had a major execution for an offshore platform on the Gas Plant. The Onshore Gas Plant was running reasonably well. The offshore platform was a bit of a basket case, to say the least. We had a lot, a lot of problems.
Erik Hupje (42:59)
Friday, it always happens on Fridays, right? Literally. Friday, literally, the vast majority of the operations and maintenance department was at a country club for a event, end of year, I don't know, planning event or whatever. I stayed behind in the office to provide support to the offshore team, just in case. Anyway, so I was pretty much the only one in the office from operations and maintenance. Get a phone call from the guys offshore and say, we've got a gas leak. Wow, we've got a very big bank.
Erik Hupje (43:26)
big gas leak. And cut a long story short, there was a carbon steel plug that had dropped out of a stainless steel valve, a body plug. Originally, it never should have been a carbon steel plug, but we didn't see it. It was insulated. It was in a vibration service. It never should have been in a vibration service, but it was because of other issues. It dropped out. We had a high pressure gas leak. We had to shut down the platform. We did not have the capability on the platform or even in the country that we knew of, at least, to fix that. So.
How do we reinstate this? This platform is pumping gas to onshore Manila, where it goes into three power stations. We've got 24 hours before we shut down the power stations, and the lights go literally out in half of Manila. That costs a lot of money and a lot of reputation, right? So the solution was we had to mobilize guys from a specialist repair service from Singapore.
Being Friday afternoon, we had just missed the last commercial flight from Singapore to get them on. So we had a fantastic logistics team, very, very good team. We managed to charter a Learjet. We picked them up with a private taxi, brought them on a Learjet. The Learjet flew to an airport in the Philippines. Our helicopter went to pick them up. They did the repair, et cetera. Very expensive way of mobilizing equipment. What was actually a very simple component and very simple repair.
Erik Hupje (44:44)
But because of the risk of losing literally millions of dollars and the reputation impact, it was the right thing to do. But chartering a Learjet, and this was almost 20 years ago, probably 18 years ago, whatever, at the time was, I don't know, $50,000 for a two-hour flight or something or whatever it was. It was short, but it was a lot of money. And that's what happens when you get into these kind of environments. And this is all because during
construction, somebody put a carbon steel plug in a stainless steel body that wasn't spotted. Now, they either did that deliberately or they didn't know what they were doing or whatever, but it was put in. The difference in material wasn't spotted. It was insulated. Then that valve was sitting in an area on the deck where above a couple of compressors, we're stipulating compressors, which were causing way more vibration than they should. And so that all led to this.
So there's a whole bunch of small defects that ultimately led to a very high risk scenario. We were lucky that that gas leak didn't escalate. And a very expensive repair. yeah, you learn that this is not how you want It's exciting, but it's not really how you want to run maintenance, is it?
Jean-Philippe PICARD (45:51)
Yeah, I mean there's so many learnings there. Yeah, some of it is that you know some stuff you just it'll come as a curve ball when you'll never have any any way that you can have almost avoided it. So yeah it always happens on Fridays. That's true. Yeah.
Jean-Philippe PICARD (46:09)
Okay, well, last thing I'd love to hear from you, and you did a lot of this as we spoke, but it'd be we love to wrap on my closing insights. So let's let's put ourselves on the shoes of a new reliability manager, maintenance manager, they they've come to a side they they walk in a classic, classic situation where there lots of problems. And they've been hired to fix, presumably everything. What do you tell that person? What is your advice to that young energeticperson.
Erik Hupje (46:38)
Okay, so I'm gonna split that question into two sections. One is a young person who is at the beginning of their career, kind of, right? And the other is walking into a bit of a dog's breakfast and having to solve that. For people starting out in their career or still relatively, yeah, starting out in their career or still relatively early on, I think there's probably two or three things to focus on. First one is, I'm gonna say this, get on top of the basics.
Erik Hupje (47:04)
And you don't have to do expensive training. You can literally just go to the library and check out the book by Murobrain, check out the book by Max Smith. They're both great books on RCM. Read books on FMEA. There's so much to learn there. Just learn those fundamental basics. Get to grips with that so you understand, now how do I build an effective and efficient PM program? How should planning and scheduling work? what should it look like? And you know.
Erik Hupje (47:28)
understand the concepts of root cause analysis, problem solving, structured problem solving, why people often don't do this well, and everything else. You get those basics understood, then that's step one. Obviously, now you need to get those implemented. So this is where other things come in place. First thing you then need to do is you need to able to take those principles and be able to kind of convert them and say, well, if we did that, this is what that could look like for our plant. But then you need to translate that into a benefit.
typically a dollar benefit because you're going to have to sell it in your organization. This is something I see a lot of maintenance managers struggle with actually. It's like, yeah, I don't know how to get buy-in from my plant manager or my VP or whatever. So you need to take those benefits and be able to convert that into a business case that you can then sell. So you need to be able to take those benefits, express them in numbers, do some simple business economics like projecting some cash flows.
Erik Hupje (48:18)
doing discounted cash flows, calculator, an ROI, or an NPV, whatever your company is using to evaluate projects and everything else, you need to be able to show that your reliability improvement project is probably the best investment they typically have. But if you can't convert those into metrics that they look at, it's going to be difficult sale. Then the third piece is continue to work on your people skills, because ultimately you need to influence people to get on board to buy what you're essentially selling, because that's what you're doing.
Erik Hupje (48:45)
and be able to work with them and be able to convince people both up the line, your managers and senior management, as well as build good relationships with the frontline or effective relationships with the frontline. know, the technicians out there, the planners and the schedules, because they have so much insight into what's happening in your plant, they can tell you a lot of stuff. It will be anecdotal and it's difficult to convert that into a number, but it's a great way of validating what you can get from your data.
or it can point you into areas where you can look for data. yeah, those would be the three things. Learn the technical basics, learn some basic business skills and business economics, and then work on the ability to sell that to people and engage with people effectively. Because ultimately, everything we do needs to be done through people. And this is also, think, a lot of the improvement programs typically fail, because the people side is not well done. It's not well managed. And it's still one of the most important things.
Erik Hupje (49:40)
So that's what I would say to somebody early in their career. If you're now a maintenance manager maybe later in your career and you walk to a plant, and yeah, question two, and the plant is a dog's breakfast, the first thing to do is to understand, OK, what's the impact of this? Because you can be very reactive, but it doesn't always have massive impact on business value.
Erik Hupje (50:02)
And that is the first thing you need to establish. you know, I recently had a conversation with a liability manager. says, I'm so frustrated. You know, we have so many breakdowns, this and that and everything else. a long story short, the plant was a manufacturing plant that operated five days a week, day shifts. If something tripped or went down or whatever, they could extend. They could run a Saturday shift. They could run extended shifts in days.
the production impact was not really there. So for them, the impact is, yeah, there's frustration. Yes, there's cost because breakdown maintenance is more expensive than well-planned, well-scheduled preventive maintenance. But typically, if we're really honest, it's the production impact that is the big value driver for most of companies. So if that is not there, your ability to sell the benefit into the organization is very, different. So the first thing to do is understand, OK, what's the consequence of all this reactiveness?
Jean-Philippe PICARD (50:53)
Yeah.
Erik Hupje (50:57)
on the business? What's it impacting in terms of production, costs, safety, potential, or integrity of the plant, the physical integrity? Is it becoming an unsafe plant? Get to understand that. Then quantify what that is costing the organization. So if you're losing a lot of downtime, you have a lot of downtime, you're losing a lot of production, and you have the capacity to sell that production, therefore there's a big value opportunity, well then that's what you need to be selling to your leadership.
And then look at a framework that you can use or a process that you can use to implement the improvements to do that. But typically, this is where I see people get stuck. They really struggle to understand what the impact is to the organization, what the value opportunity is, and then come up with a relatively simple improvement plan. Because like you said earlier, if you make your framework or your plan has more than three or four or five steps, people will forget about it.
Jean-Philippe PICARD (51:43)
Hmm.
Erik Hupje (51:50)
You know, the other thing is very often you see people try and make these improvement plans as if they're, they go from reactive to world-class. Don't do that. Just go from reactive to non-reactive and then from non-reactive to proactive. And then, you know, you can work, but just, just make a plan that gets you out of the mess in the next 12, 18, 24 months. Focus on that. And then, then you can look at what the next steps are after that. It makes it easier to sell, makes it easier to manage.
Jean-Philippe PICARD (51:58)
Yeah.
Erik Hupje (52:16)
makes it easier for people to get their head around and to get behind it.
Jean-Philippe PICARD (52:20)
Yeah, good. So I've asked one question you answered too. No, no, no, we're getting a lot of value out of this. Fantastic. Thanks so much, Eric. That was, I mean, there's still so much more I could ask you, but perhaps for chapter number one, and for first time we do this, I think we've done enough here. So look, thank you again so much. Thanks for being so generous with your sharing knowledge.
Erik Hupje (52:25)
Sorry for that.
Jean-Philippe PICARD (52:44)
not just right now, but through all that you do on LinkedIn. So if people want to find you, they've listened to this and they thought this was very interesting and they want to know a bit more about you, what are the best ways people find you nowadays?
Erik Hupje (52:56)
The easiest way is to indeed, like you say, find me on LinkedIn or just head over to reliabilityacademy.com and they can find us there.
Jean-Philippe PICARD (53:00)
Sounds good. sounds good. Yeah, I highly recommend you follow Eric on LinkedIn. So thank you again and yeah, until next time.
Erik Hupje (53:09)
Thank you for having me.
Yes, sounds good. I enjoyed it. And as I said, thank you very much for having me. It was a pleasure.
Jean-Philippe PICARD (53:17)
Thanks, Eric.