A simple statistical library for Arduino.
remarks & comments
Intro
One of the main applications for the Arduino board is reading and logging of sensor data. For instance one monitors the temperature and airpressure every minute of the day. As that implies a lot of records, we often want the average and standard deviation to get an image of the variations of the temperature of that day.
Statistics library
The Statistics library just calculates the average and stdev of a set of data(floats). Futhermore it holds the minimum and maximum values entered. The interface consists of nine functions:
Statistic(); // constructor
void clear(); // reset all counters
void add(float); // add a new value
long count(); // # values added
float sum(); // total
float minimum(); // minimum
float maximum(); // maximum
float average(); // average
float pop_stdev(); // population std deviation
float unbiased_stdev(); // unbiased std deviation
Internally the library does not record the individual values, only the count, sum and the squared sum (sum*sum), minimum and maximum. These five are enough to calculate the average and stdev. The nice part is that it does not matter if one adds 10, 100 or 1000 values.
Usage
A small sketch shows how it can be used. A random generator is used to mimic a sensor.
#include "Statistic.h"
Statistic myStats;
void setup(void)
{
Serial.begin(9600);
Serial.print("Demo Statistics lib ");
Serial.println(STATISTIC_LIB_VERSION);
myStats.clear(); //explicitly start clean
}
void loop(void)
{
long rn = random(0, 100);
myStats.add(rn/100.0);
Serial.print(" Count: ");
Serial.print(myStats.count());
Serial.print(" Average: ");
Serial.print(myStats.average(), 4);
Serial.print(" Std deviation: ");
Serial.print(myStats.pop_stdev(), 4);
Serial.println();
if (myStats.count() == 300)
{
myStats.clear();
delay(1000);
}
}
In setup() the myStats is cleared so we can start adding new data.
In loop() first a random number is generated and converted to a float to be added to myStats. Then the count, the average and std deviation sofar is printed to the serial port. One could also display it on some LCD or send over ethernet etc. When 300 items are added myStats is cleared to start over again.
Notes
In the first version I collected all the samples in an array but that resulted in quite some memory usage and the user had to know the number of samples beforehand to allocate enough room. As I found this not quite acceptable therefor I stripped the data-array from the class to make it more elementary.
To use the library, make a folder in your SKETCHBOOKPATH\libaries with the name Statistics and put the .h and .cpp there.
Todo
- Looking at a more extended statistical lib.
- Create a template class so it can work with other datatypes.
- Create a zip for Google code or wherever.
Enjoy tinkering,
rob.tillaart@removethisgmail.com
Update
- 2010-11-01 Added stddev, minimum and maximum
- 2011-01-07 Gil Ross send me an improved version of the library that is numerically more stable. This is version 0.3. Thanx Gil,
- 2012-05-19 Added NAN as error iso -1 which was incorrect.
Statistics.h file
#ifndef Statistic_h
#define Statistic_h
//
// FILE: Statistic.h
// AUTHOR: Rob dot Tillaart at gmail dot com
// modified at 0.3 by Gil Ross at physics dot org
// PURPOSE: Recursive Statistical library for Arduino
// HISTORY: See Statistic.cpp
//
// Released to the public domain
//
// the standard deviation increases the lib (<100 bytes)
// it can be in/excluded by un/commenting next line
#define STAT_USE_STDEV
#include <math.h>
#define STATISTIC_LIB_VERSION "0.3.1"
class Statistic
{
public:
Statistic();
void clear();
void add(float);
long count();
float sum();
float average();
float minimum();
float maximum();
#ifdef STAT_USE_STDEV
float pop_stdev(); // population stdev
float unbiased_stdev();
#endif
protected:
long _cnt;
float _store; // store to minimise computation
float _sum;
float _min;
float _max;
#ifdef STAT_USE_STDEV
float _ssqdif; // sum of squares difference
#endif
};
#endif
// END OF FILE
Statistics.cpp
//
// FILE: Statistic.cpp
// AUTHOR: Rob dot Tillaart at gmail dot com
// modified at 0.3 by Gil Ross at physics dot org
// VERSION: see STATISTIC_LIB_VERSION in .h
// PURPOSE: Recursive statistical library for Arduino
//
// NOTE: 2011-01-07 Gill Ross
// Rob Tillaart's Statistic library uses one-pass of the data (allowing
// each value to be discarded), but expands the Sum of Squares Differences to
// difference the Sum of Squares and the Average Squared. This is susceptible
// to bit length precision errors with the float type (only 5 or 6 digits
// absolute precision) so for long runs and high ratios of
// the average value to standard deviation the estimate of the
// standard error (deviation) becomes the difference of two large
// numbers and will tend to zero.
//
// For small numbers of iterations and small Average/SE th original code is
// likely to work fine.
// It should also be recognised that for very large samples, questions
// of stability of the sample assume greater importance than the
// correctnness of the asymptotic estimators.
//
// This recursive algorithm, which takes slightly more computation per
// iteration is numerically stable.
// It updates the number, mean, max, min and SumOfSquaresDiff each step to
// deliver max min average, population standard error (standard deviation) and
// unbiassed SE.
// -------------
//
// HISTORY:
// 0.1 - 2010-10-29 initial version
// 0.2 - 2010-10-29 stripped to minimal functionality
// 0.2.01 - 2010-10-30
// added minimim, maximum, unbiased stdev,
// changed counter to long -> int overflows @32K samples
// 0.3 - branched from 0.2.01 version of Rob Tillaart's code
// Released to the public domain
// 0.3.1 - minor edits
//
#include "Statistic.h"
Statistic::Statistic()
{
clear();
}
// resets all counters
void Statistic::clear()
{
_cnt = 0; // count at N stored, becoming N+1 at a new iteration
_sum = 0.0;
_min = 0.0;
_max = 0.0;
#ifdef STAT_USE_STDEV
_ssqdif = 0.0; // not _ssq but sum of square differences
// which is SUM(from i = 1 to N) of
// (f(i)-_ave_N)**2
#endif
}
// adds a new value to the data-set
void Statistic::add(float f)
{
if (_cnt < 1)
{
_min = f;
_max = f;
} else {
if (f < _min) _min = f;
if (f > _max) _max = f;
} // end of if (_cnt == 0) else
_sum += f;
_cnt++;
#ifdef STAT_USE_STDEV
if (_cnt >1) {
_store = (_sum / _cnt - f);
_ssqdif = _ssqdif + _cnt * _store * _store / (_cnt-1);
} // end if > 1
#endif
}
// returns the number of values added
long Statistic::count()
{
return _cnt;
}
// returns the average of the data-set added sofar
float Statistic::average()
{
if (_cnt < 1) return NAN; // original code returned 0
return _sum / _cnt;
}
// returns the sum of the data-set (0 if no values added)
float Statistic::sum()
{
return _sum;
}
// returns the sum of the data-set (0 if no values added)
float Statistic::minimum()
{
return _min;
}
// returns the sum of the data-set (0 if no values added)
float Statistic::maximum()
{
return _max;
}
// Population standard deviation = s = sqrt [ S ( Xi - ยต )2 / N ]
// http://www.suite101.com/content/how-is-standard-deviation-used-a99084
#ifdef STAT_USE_STDEV
float Statistic::pop_stdev()
{
if (_cnt < 1) return NAN; // otherwise DIV0 error
return sqrt( _ssqdif / _cnt);
}
float Statistic::unbiased_stdev()
{
if (_cnt < 2) return NAN; // otherwise DIV0 error
return sqrt( _ssqdif / (_cnt - 1));
}
#endif
// END OF FILE