4,500 m2
Farm area
24,000 GPUs • 2,917 ASICs
Hardware
128,459 ETH
Mined
Services
User Research, Product Strategy, Product Design
Role
I led the redesign closely collaborating with stakeholders and engineering team
Year
2020-2021 (1 yr 1 mo)
Platform
Web
Result
−648h/y
+$500K
Increase in annual profit
Challenge
Equipment repair time is a key metric. Every second of delay between detecting and fixing issues is money lost. The sum of these seconds—$616,000 annually.
01
How might we
help technicians detect equipment issues in real-time?
02
How might we
minimize time from problem detection to repair initiation?
Status quo
I inherited a product from the previous team. They copied an existing solution without understanding the client's specifics. The result didn't fulfill business needs and required a redesign.
Process
Business research
Before I joined, the team rarely communicated with the farm's CEO and almost never with our users—the farm's technicians. So no one properly understood what to do.
To figure things out, I started by meeting with the CEO and set up regular meetings with him.
What I learned
I understood how the farm works, how it makes money, and what reduces ROI.
This helped formulate principles (first principles thinking) that became the foundation for product decisions:
Speed
Mining business earns by solving mathematical puzzles for the blockchain network.
Whoever solves it first wins the reward.
Our solving speed depends on the combined computing power of all our devices.
Device issues happen constantly.
Each issue reduces power, which slows down our solving speed, and with speed—income.
We can't prevent issues from arising, but can influence how quickly we fix them.
Fix faster, earn more.
Efficiency
Each issue reduces power differently.
The extent of reduction depends on the type and intensity of the issue.
Power reduction is distributed unevenly: 20% of problematic devices are responsible for 80% of total reduction.
Therefore, issues are not equal and cannot be fixed in random order.
User research
The next step was getting to know our users. I initiated a team trip to the farm. For two days we worked alongside technicians, doing their work.
The farm building
What I learned
As we remember, every second of delay between detecting and fixing issues is money lost. The trip helped discover the main causes increasing this time gap:
Device issues are detected with delay.
You can't start repairs until you determine what to fix and in what order.
Solutions
Challenge 01
How might we help technicians detect equipment issues in real-time?
Smart alerting system
Issues are detected with delay, postponing repair initiation.
The later we react, the worse it gets: from slower device performance to its breakdown.
Before
Telegram bot helps stay informed about everything happening on the farm.
The problem is it serves many scenarios.
Messages arrive ~1 time per minute, only every 6th is about an issue.
Reacting to every notification is impractical.
This forces technicians to check for issues directly in the product.
After
Created a separate bot only for issues.
Messages arrive based on new rules:
Expensive issues—send every incident.
Cheap ones—bundle into one message when reaching a set quantity.
Notifications come with a special sound to stand out from other chats.
Challenge 02
How might we minimize time from issue detection to repair initiation?
When you detect issues, you can't start repairs without solving two intermediate tasks:
Gather information about all current issues.
Determine the order of fixing them.
Issue tracker
Issues are detected with delay, postponing repair initiation.
The later we react, the worse it gets: from slower device performance to its breakdown.
Before
We only show a summary of how many devices are offline. But this is just 1 of 5 issue types. The rest are scattered throughout the product.
After
Added the remaining issue types. All the information at a glance.
Profit-based device organization
Devices should be fixed in order of profit impact.
To determine the order, you need to find and compare data from hundreds of devices across dozens of folders.
No one can process that volume quickly and accurately at the same time. You have to sacrifice accuracy.
Suboptimal order—lost profit.
Before
Devices are distributed across folders by their physical location in the building (floor + row).
Each folder contains all devices from a location.
For prioritization only problematic ones are needed. But they're mixed with healthy ones, which vastly outnumber them.
You need to hunt for them, assess each one's condition, and create a repair order.
After
Added a new Issues page showing only problematic devices.
The algorithm sorts devices by profit impact. A technician gets a ready-to-go plan.
No hunting, no analysis—straight to work.
