hitsaru

Mjolnir – An Open Source Power Cycle Automation service

Filed under Development
Topics covered: , , , , ,

Disclaimer: None of the links in this post are affiliate links. I’m just a guy posting a thing. Enjoy.

Anybody with any experience working with cryptocurrency mining equipment will tell you that at some point the equipment requires attention, typically hanging and needing a simple power cycle. Anybody who tells you otherwise isn’t collecting performance metrics and is lying out of ignorance. It may be innocent, but it’s still wrong. Professionals live in a world where they demand data to provide certainties one way or the other.

While I’ve worked on many small scripts and projects for cryptocurrency mines over the course of the last couple of years, the one that is likely to have the most direct impact on mining performance is the most recent. I call it Mjolnir, not because I’m a Disney Marvel Cinematic nerd, but because I’m a nerd, period.

Mjolnir is a simple performance monitoring and troubleshooting automation script intended for cryptocurrency mines working with Raritan PX series Power Distribution Units (PDUs). If you are running a mine and you aren’t using smart PDUs, you’re doing it wrong, plain and simple. Smart PDUs are networkable and generally allow you to directly interface with readouts and power cycling through a web interface. Raritan’s PDUs have a robust API complete with a full JSON RPC SDK (in this case we are 3.4.0, you may need to update your firmware) for just this kind of development.

If you aren’t fluent with code, I’ve got you covered. Rejoice: I wrote Mjolnir as Open Source Software under the MIT license. It’s free for everybody. It’s well commented with a seperate variables file to make it easy to run.

But back to what it is and what it does for you:
Using Raritan’s RPC, Mjolnir iterates through a series of reserved/static’d IP addresses assigned to your networked smart PDUs. It checks each outlet for a threshold of acceptable current (amperage), and if those outlets have a current draw (greater than 0 amps) and less than an acceptable threshold (hard-coded as less than 1 amp), then the script cycles the outlet with a ten second delay to ensure that the PSU is fully discharged before cycling the outlet back on.

BUT WAIT, THERE’S MOAR!

Cycling bad gear infinitely is probably worse than not cycling it at all. Mjolnir uses a local SQLITE database to keep a record of your PDU names/positions and all outlet state reads it performs. Using a timedelta function, it’s able to read the last X number of statuses for a particular outlet within a defined timeframe. If the number of times cycled per the time frame specified is exceeded, it shuts the outlet down and sends a message to a predesignated Slack Channel via a Slackbot API token. So, you can set up a bot to coordinate your repair schedule, informing you or your team when rigs have fallen below your acceptable service level, and they’re already off and ready to be serviced before you arrive. By the time you have to service a unit, you know there is no alternative to physically being there to service it.

Why Raritan?
Raritan has been good to me. My team has had the opportunity to work with Raritan and I like the way they work, but there’s more to it than that. Raritan’s PX3 series is now the standard, but PX2 series pdus have been EOL for a while now. They can be obtained second hand for far cheaper than new PDUs enabling starting cryptocurrency miners to still get their hands on good equipment at a reasonable price. As of writing PX2 series PDUs are fully compatible with the modern firmware, so aside from an upgrade, Mjolnir will work with them just as well as the new models. Whatever your budget is, Mjolnir will make Raritan’s Smart PDUS cost effective over unmanaged PDUs. I imagine other PDUs could perform some level of this functionality or all of it, but I had my hands on Raritans and the SDK made it easy to work with.

Show me the numbers

Quantifying numbers are fun when you’re a nerd. They justify why computer and programming are cool.

I tested the deployment of Mjolnir in an environment with around 40 PDUs, and well over 300 ASIC workers and a complement of GPU Rigs. All of this hardware was second hand and needed more love and care than brand new rigs at a second test site.

Initially I estimated that a single human having access to a PDU interface and recording the outlet statistics of each outlet and cycling needed outlets, could perform the task in about 6 minutes per PDU. This is a little on the long side, but assume we’re talking about low skilled humans getting sloppy and bored. Assume they are doing this task over and over every hour of the day. That kind of bored and sloppy.

If we want this task done for 40 PDUs we’re looking at over 4 hours worth of work. But we’re demanding: we want it done in an hour(ish). On top of that, we want it done 24 hours a day, 7 days a week. So we need at least 4 humans, assuming we pay them $15 an hour, that costs us $60 an hour, or $10,080 a week, or $524,160 per year. That’s a conservative estimate for a 1 megawatt facility, and that is the work that Mjolnir does.

In the spirit of free and obtainable software, Mjolnir was tested and deployed in more than one location on a $40 Raspberry Pi. Look at those metrics again and the cost of deployment and let that sink in.

In the first week of operation in the first test site Mjolnir performed 93 automated power cycle operations and isolated 5 units functioning below standard. In a second site with all new hardware Mjolnir performed 18 cycles in three days and located 3 underperforming units within 32 hours of operation. All of this was done with no human intervention.

“Talk is cheap, show me the code”

Get Mjolnir and Smite manual power-cycling.