Other weekly network statistics posts
Weekly pool statistics posts
Weekly block maker statistics posts
To do list:
0. I get internet, you get weekly stats
So I’m finally back online, and the stats are back. Thanks everyone who emailed me asked when I’d return, and especially thanks to the guys who are helping me get a cloud based system running, and I hope eventually to move to a Jekyll blog on github.io.
1. Centralisation decreasing even as hashrate drops below 50% public?
Centralisation is clearly decreasing on every index that I measure, and even a simple visual measure such as this:
shows how the network is becoming more evenly distributed between block makers. The gamma diversity measure indicates nine effective competitors, up from five only a few months ago. However we’ve dropped below the 50% publicly owned hashrate point. I’m a bit behind in my investigations of block makers, so there maybe a few block makers that I’ve labelled as private that are actually public — but not enough to make a difference here.
User account hashrate distributions still needed
I want to be able to publish estimated number of miners and mean and median miner hashrate again, but I can only do that if I can get some more mining pool data.
If you have time, encourage your mining pool to provide a “Hall of Fame” feed. I need user account hashrates in order to estimate a number of different network statistics, and to do that I need user account hashrates averaged over at least an hour, preferably several hours or more. The data can be anonymised – it’s just the user hashrates I need.
If you want to try your hand at generating your own bitcoin network statistics and you know some R then I’ve some scripts that allow you to download bitcoin block chain data from various APIs, converted to a .csv table (a format supported by spreadsheets). So far only Blockchain.info, Blocktrail.com, and Kaiko.com are supported.
- R bitcoin blockchain data API access function
The plots below show the network hashrate since block height 1 and for the last year. The mean estimate is calculated using the daily average hashrate.
The second chart also includes confidence intervals for the hashrate, the mean hashrate estimate, and a 28 day forecast estimate.
- The dashed line is the mean hashrate estimate.
- The grey shaded area is the 95% confidence interval for the mean hashrate estimate.
- The dotted line is the 95% confidence interval for daily hashrate averages, given the mean hashrate estimate, so 95% of the large grey dots (average daily hashrate) should be within the dotted line.
- The blue shaded areas are the confidence intervals for the forecast.
- Forecast confidence intervals are bootstrapped.
You notice that the mean forecast is not given – just the confidence intervals. The reason for this is that in the past people have focused on the mean forecast, but I think the range of values the network hashrate could take is much more important.
Miner profitability and forecast
“Income” is an estimate which ignores reward method and pool fee, and includes transaction fees.
- The first plot below shows the weekly miner income and cumulative miner income for the past 52 weeks.
- The second plot shows the weekly miner income for the past 26 weeks with an eight week forecast, and the cumulative miner income eight week forecast.
- Forecast confidence intervals are bootstrapped.
Again, the mean forecast is not given for the same reasons I gave previously. Eight weeks forecast is possible as these are weekly summary statistics; for daily summary statistics (such as above) only four weeks forecast is possible with any accuracy.
Transaction fees are often overlooked by miners but will become very important for them – as the block reward decreases, transaction fees must necessarily go some way toward ameliorating the loss in block reward.
The first plot is a simple forecast that uses an ARIMA model to create a realistic forecast for monthly average transaction fees per block. Think of it as a trend line with confidence intervals. Since it model future values as an auto regressive and moving average function of previous values, it cannot account for sudden changes to the network — so use it as a guide only.
The plots following the first are self explanatory and are kernel smoothed estimates of block summary statistics.
Transaction rates, block rates and empty blocks
Note the spikiness of block rates — this is due to the protocol determined change in difficulty which moves the block rate toward 144 blocks per (ten minutes per block).
Since these plots have been smoothed over a 14 day period, the cyclical nature of transaction rates (previously analysed here and here) will not be visible.
General inequality between block makers (facet 1)
Previously, I have described inequality measures. The two general inequality measures, the Gini coefficient and the Theil index, measure inequality between blocks block makers. They are minimised when all block makers solve a similar number of blocks over a period of time and maximised if only one of many block makers solves all the blocks for a given period of time (since we know that bitcoin mining is a stochastic process in which variance can be significant, a reasonable time period should be chosen).
The Herfindahl index theoretically captures the equivalent share that would be enjoyed by equal-sized firms in the marketplace.
Inequality between groups: smaller block makers and larger block makers (facet 2)
I’m using two ways to illustrate inequality between the half of the network with the highest concentration of hashrate, and the half of the network with the lowest concentration of hashrate.
Mining centralisation index = 1 – mean(Sblocks) / mean(Lblocks)
Sblocks = number of blocks solved by small block makers
Lblocks = number of blocks large by large block makers
(details on how ‘large’ and ‘small’ are defined)
This index is measuring the inequality between two groups: the half of the network with the highest concentration of hashrate, and the half of the network with the lowest concentration of hashrate. It can be interpreted as:
Large to small density ratio = 1 / (1 – centralisation index)
For example an index of 80% means that the average larger pool has 1 / (1 – 0.8) = 5 times greater proportion of the network than the average smaller pool.
Mining centralisation index 2 = Sh * (log(Sh) – log(Sn)) + Lh * (log(Lh) – log(Ln))
Sh = Sblocks/(Sblocks + Lblocks)
Sn = No. small pools/(No. small pools + No. large pools)
Lh = Lblocks/(Sblocks + Lblocks)
Ln = No. large pools/(No. small pools + No. large pools)
This also has a range from maximum equality at 0 to maximum inequality at 1, but does not have an intuitive meaning (except that lower is better).
Below the two general and two grouped inequality measures have been plotted. The Gini coefficient and the Theil index are quite similar, and the Mining centralisation indices 1 and 2 also are quite similar.
General inequality between block makers: Gamma diversity
The Gamma diversity with q = 2 is equal to the inverse of the Herfindahl Index, and in this case equals the equivalent number of competitive firms.
Inequality between groups: Public mining pools and non pool block makers.
Another concern many people have is that public mining pools have a decreasing share of the network. Public mining pools are reliant on miners in order to make blocks
and distribute rewards, and a pool with fewer miners has greater income variance.
This means that if a pool was doing something to the block chain that miners don’t like (anything from incorporating graffiti into the block chain – some of my favourite graffiti here – to Selfish Mining), miner could choose to leave the pool. Non pool block makers might have fewer restrictions on their actions, which could be a problem for the network.
There are a number of different ways to analyse this, but I went with something quite simple:
Public mining pools % network = P / N
P = no. of blocks attributable to public mining pools in some period of time
N = no. of blocks solved by network in same period of time.
This is simple to understand. If you worry about mining pools disappearing, then the fact the line is heading toward 50% won’t help you sleep at night.
Organofcorti lives in the blockchain!
organofcorti.blogspot.com is a reader supported blog:
Created using R and various packages, especially dplyr, data.table, ggplot2 and forecast.
- For help on ggplot2.
- For help on forecasting.
Thank you to blocktrail.com for use of their address data, and coincadence.com for their p2pool miner data.
Find a typo or spelling error? Email me with the details at email@example.com and if you’re the first to email me I’ll pay you 0.01 btc per ten errors.
Please refer to the most recent blog post for current rates or rule changes.
I’m terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.
- Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
- Errors in chart texts: Since I can’t fix the chart texts (since I don’t keep the data that generated them) I can’t pay for them. Still, they would be nice to know about!
I write in British English.