Scaling Bitcoin Node with Kubernetes

Blockchain-based solutions became quite popular for the past few years, and for having blockchain data available you have to have some sort of data access via privately hosted blockchain node or get data from https://blockpulsar.com

Hosting privately is quite costly, but it’s a kinda tradeoff for managing everything yourself and be in control of which regional distribution or API you want, versus get it with a simple subscription and let other professionals manage technical stuff.

If you have some kind of a project which needs a privately hosted Bitcoin node, then most probably making raw configurations directly on a virtual server is a very bad idea, especially if there are too many scalable solutions available to configure your cloud environment.

Putting Bitcoin node in Docker container

Essentially Bitcoin node is some kind of a basic C++ server application that is quite storage-intensive and CPU intensive if you want to push it hard and get the most from it. The hardest part to figure out is the massive Docker Volume that you have to mount to be able to run the node.

At the time of writing this article single, BTC Node takes around 450GB in storage volume to pull full blockchain and operate properly as a container. Which is already quite expensive and almost impossible to run on a standard Macbook Pro for making dev testing available. That’s why it is important to make sure that your Kubernetes environment has access to at least for 500GB to make deployments properly going.

First of all, let’s use debian:buster as a base image, it is quite easy to set up BTC Node there and as a Docker base image distribution it is quite popular, so we wouldn’t have any problems researching some potential issues

FROM debian:buster

WORKDIR /root

Keeping /root as our entry point, because we are working as a root user inside container and BTC node is going to be working from /root/.bitcoin base volume.

Because we want to look a bit fancy and make sure that we have properly compiled the BTC node we are going to download sources and compile them directly inside the container during the build process

RUN apt update && apt install --yes wget && \
    wget https://bitcoin.org/bin/bitcoin-core-0.21.1/bitcoin-0.21.1.tar.gz && \
    tar xzf bitcoin-0.21.1.tar.gz && mv bitcoin-0.21.1 bitcoin && \
    rm -rf bitcoin-0.21.1.tar.gz

Of course, you can make an environment variable for the BTC Node version like replacing 0.21.1 version from an environment variable to make version upgrades a lot easier.

To be honest BTC Node has a ton of library dependencies, it comes out like that because during C++ server development you have quite a bit external libraries to use, instead of building everything by yourself, so we have to install dependencies and compile Node finally

RUN apt install --yes build-essential autoconf libtool pkg-config \
    bsdmainutils libboost-all-dev libevent-dev

RUN cd bitcoin && ./autogen.sh && ./configure --disable-wallet && \
    make -j4 && make install && cd /root && rm -rf bitcoin && apt clean

Note that we are doing each step as a separate RUN command to fix the image layers and make sure we are caching things if there are no changes. For example, if you built this one time, the second time it will be built only if you change the BTC Node version or you remove the base image from your system.

Probably the major part of building this semi-scalable Kubernetes system for BTC node is to have storage mounted as a separate volume to Docker container into /root/.bitcoin a directory, but the thing is that Bitcoin Node keeps also its configurations inside the same folder, so that if we will mount the volume we will lose our BTC Node configurations. So, we have to kinda copy the configuration from our local disk to /root/bitcoin.conf assign it as a separate configuration file later on.

COPY ./bitcoin.conf /root/bitcoin.conf

EXPOSE 8332 8333

CMD /bin/bash -c "bitcoind --conf=/root/bitcoin.conf"

Rest is as simple as it goes with Running a docker container with already predefined exposed ports and startup command

Simple Kubernetes configuration for multiple replicas

As a basis of course we have to define the Kubernetes service itself to make sure it is reachable inside the Kubernetes cluster, but it is totally regular service configuration here, nothing specific to BTC Node yet!

# btc.service.yml
apiVersion: v1
kind: Service
metadata:
  name: bitcoin
spec:
  selector:
    app: bitcoin
  ports:
    - protocol: TCP
      name: http
      port: 8332
      targetPort: 8332

The only thing is that we are targeting port 8332 for getting generic RPC API from BTC Node.

The second part of the configuration comes with defining StorageClass which is going to be a little bit different base on Cloud Provider or dedicated server configuration, but the idea is that we have to have a storage class that we can reuse across multiple replicas to make a new volume for each BTC Node replica. This sample is making things in Google Cloud.

# btc.storage.class.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: bitcoin-disk
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  zone: us-central1-a
allowVolumeExpansion: true

After having the storage class defined and applied we can go ahead and define an actual pods configuration which is not going to be a regular Deployment, but it is StatefulSet which makes sure that we are replicating allocated resources for each of the replicas. It is mostly used for databases or cache instances, but don’t forget that BTC Node contains Blockchain which is also some kind of a database and it is of course a stateful application by design.

# btc.stateful.set.yml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: bitcoin
spec:
  serviceName: bitcoin
  replicas: 1 # Scale as match as you want
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: bitcoin
  template:
    metadata:
      labels:
        app: bitcoin
    spec:
      restartPolicy: Always
      containers:
        - name: bitcoin
          image: <your BTC Node Image>
          volumeMounts:
            - name: bitcoin
              mountPath: /root/.bitcoin
  volumeClaimTemplates:
    - metadata:
        name: bitcoin
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: bitcoin-disk
        resources:
          requests:
            storage: 500Gi

That’s it! After having this you can basically scale BTC Node based on the desired count by changing replicas number, but remember that each replica is going to require another 500GB storage and more resources in processing BTC Node transactions.

One corner case to consider is that whenever you try to scale BTC Node make sure you have some sort of load balancing in front of it. I know that Kubernetes has it built-in from Service to Pod level, BUT based on the experience coming from https://blockpulsar.com sometimes BTC Node shows up as healthy but it fails to fetch some parts of the data. So, maintaining production state BTC Node is quite a bit more pain than managing a standard service.

Conclusion

Most of the blockchain-based applications require data from blockchain nodes using direct RPC API, which adds a requirement to have them in your cloud infrastructure with maintenance and pricy setup.

Scaling BTC Node is not that hard if you have resources and automation built on top of it to keep it up to date and handle load balancing properly. Some of the cloud infrastructure providers have even limitations of how to match storage you can use per account, which is another limitation for hosting BTC Nodes locally because most of the costs associated with them come down to scaling storage space per Node. That’s why there are services like https://blockpulsar.com which are doing all that complicated infrastructure stuff by giving clean API for using Bitcoin blockchain API.

Putting Bitcoin node in Docker container#

Simple Kubernetes configuration for multiple replicas#

Conclusion#

Putting Bitcoin node in Docker container

Simple Kubernetes configuration for multiple replicas

Conclusion