DANTE ZFS HA Cluster
This repository holds all code I develop on my journey building
a high availability cluster using ZFS. Planned are two head nodes using
standard debian, and multiple disk shelves. To manage this on a hardware level,
I want some kind of basic web interface showing all disks and so on.
This will be my little experiment using go.
cmd/smart_status
This tool loads information about all given drives and queries them for smart data.
It is mainly to test the functionality of loading information via smartctl.
You can force the type of the drives via command flags, either nvme, sata or sas.
This can be useful if some wrapper, e.g. multipath, is hiding the true type.
Wildcards and globbing are supported, however, make sure to pass the path as string.
Otherwise your shell will expand the path and only the first given path will be
used with the correct type.
Partitions and unknown types are skipped. E.g. /dev/nvme0 is no block device
and cannot be queried by lsblk or smartctl. /dev/nvme0n1p1 is a partition and
will be skipped. Debug mode prints out if that happens.
To enable debug mode, this needs to be as first parameter.
# only first disk has explicit type nvme
./smart_status --nvme /dev/nvme*
vs
# the path is passed as string and expanded in the program
./smart_status --nvme "/dev/nvme*"
# debug mode
./smart_status --debug --nvme "/dev/nvme*"
cmd/disk_layout
This tool is a test binary for extracting and making use of data using storcli.
We can query data of SAS controllers, SAS disk shelves (enclosures) and disks.
These information can be useful to generate knowledge of the position and disks
in use within the system. SAS attached disks can be multipath, which is reflected
in the data we receive using storcli.
We can see which path (controller -> enclosure -> disk) the commands take,
and use that for documentation building, alerts and other things.
Currently, it loads sample data from the test folder.
The sample data is dumped using storcli on my real test setup.
The sample data is not being published currently, as it contains serial numbers
of all kind of devices :)
Future Plans & Ideas
- Combine smartctl and storcli data to have all kinds of data at hand for every drive.
We essentially want one platform for all important disk information.
This might in future grow to IO data via prometheus or something.
Might be fun to log IO delay and stuff for each data and highlight worst performer
to make debugging easier.
- ZFS integration
I want to integrate ZFS into the mix, as this is my go-to filesystem.
If all goes well, i want to migrate off TrueNAS to this high availability
storage cluster and not loose a ton of functionality.
In my opinion, TrueNAS is a nice entry into ZFS and enterprise storage.
However, it lacks high availability, which we might be able to do on a budget
using a system like this one.
- Web Interface with visualization of Disk Shelves
I want a sweet, easy and no bullshit overview of my ZFS pools and disks.
Basically, capacity of the available pools, easy identification of disks within
the pools - including physical identification - performance indicator using
IO delay and bandwidth.
These points are my current focus.
Authentication, multi head-server sync and so on is also on the bucket list.
However, I want to get the small, essential stuff done before thinking about the
bigger picture. I am still learning to code in go, this is one of my first proper
projects using it, so lets see how this will go.