For Science and Engineering

hamady.org

Programming Environment for Numerical Computing

Time

Math

Descriptive Statistics and Data Analysis

Descriptive Statistics

The data namespace includes functions to calculate the descriptive parameters of a list of values:
Minimum (data.min(t)), Maximum (data.max(t)), Sum (data.sum(t)), Mean (data.mean(t)), Median (data.median(t)), Variance (data.var(t)), Standard Deviation (data.dev(t)), Coefficient of Variation (data.coeff(t)), Root Mean Square (data.rms(t)), Skewness (data.skew(t)) and Kurtosis excess (data.kurt(t)).
The formulas used are as below:
Stats

All the statistics functions take a Lua table as argument, with 2048 maximum number of elements.

Example:

Copy the following script in the editor and click Run (or press F12 on Windows and Linux)

-- Stats
cls()

t = {1,1,2,3,4,4,5}
-- expected: m = 2.8571428571429
m = data.mean(t)
print(m)

Data Analysis

The data namespace includes functions to perform data analysis including fitting using user-defined model, FFT and autocorrelation calculations:

pars, chi, iters, str = data.fit(func, tx, ty, fpar, ipar, tol, iters)

Runs the fitter algorithm with:
func the Lua model function name. The Lua function syntax is as following (replace with your own model):

function fitfun(fpar, dpar, x)
  dpar[1] = 1
  dpar[2] = x
  y = fpar[1] + fpar[2]*x
  return y
end

fpar is the fitting parameters table.
dpar is the table of partial derivatives.

x the independent variable.
tx the table with X data.
ty the table with Y data.
fpar is the fitting parameters table.
ipar table containing, for each parameter, value 1 if the parameter is varying or 0 if it is fixed. This parameter ipar is optional.
tol the relative tolerance to be reached. This parameter tol is optional.
iters he maximum number of iterations for the fitting algorithm. This parameter iters is optional.

The function data.fit return four parameters: the obtained parameters table pars ; the chi number ; the number of performed iterations iters and a message str from the fitter engine.

Example:

Copy the following script in the editor and click Run (or press F12 on Windows and Linux)

-- Linear Fit
cls()

function func(fpar, dpar, x)
   dpar[1] = 1
   dpar[2] = x
   y = fpar[1] + fpar[2] * x
   return y
end

tx = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
ty = {0.1, 0.8, 2.2, 3.1, 3.8, 5.1, 5.9, 7.1, 8.0, 9.2}
fpar = {3,3}
ipar = {1,1}
pars, chi, iters = data.fit("func", tx, ty, fpar, ipar, 1e-3, 100)

io.write(string.format("pars = [%g %g]\nchi = %g, iters = %d\n", pars[1], pars[2], chi, iters))

p = plot.new(800,600)
plot.set(p, "title", "Linear Fitting")
txf = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
tyf = {}
for i = 1, #txf, 1 do
   tyf[i] = pars[1] + pars[2]*txf[i]
end
plot.add(p, txf, tyf)      -- plot linear fit
plot.add(p, tx, ty)        -- plot data
plot.set(p, 1, "color", "red")
plot.set(p, 2, "style", "o")
plot.update(p)

ft = data.fft(data, idir)

calculate the FFT with:
data the table with data
idir 1 for forward FFT and 0 for inverse
The function data.fft returns the obtained FFT table ft.
NB: the FFT amplitude is scaled (divided) by the number of points.

Example:

Copy the following script in the editor and click Run (or press F12 on Windows and Linux)

-- FFT
cls()

Fo = 50 -- signal frequency (Hz)
To = 1/Fo -- signal period (seconds)
A = 5 -- signal amplitude
An = 1 -- noise amplitude
N = 256 -- number of points (power of 2)
Ts = 4 * To/N -- sampling period
Fs = 1/Ts -- sampling frequency
f = {}
t = {}
y = {}
for i = 1, N, 1 do
   f[i] = (i - 1) * Fs / (N - 1) -- frequency
  t[i] = (i - 1) * Ts -- time
  y[i] = A*cos(2*pi*Fo*t[i]) + An*lmath.random()
end

tfd = data.fft(y, 1)
p = plot.new(800,600)
plot.add(p, f, tfd)
plot.update(p)

ac = data.acorr(data)

Calculate the autocorrelation with:
data the table with data
The function data.acorr returns the obtained autocorrelation table ac.

ts = data.sort(t, asc)

Sort table t in as(des)cending order (asc = 1 for ascending)
Returns the sorted table

yf = data.filter(x, y, forder)

Filter (smooth) x-y data using the Savitzky-Golay method, given the filter order.
Returns the filtered data yf.

ASCII Data Files

c1,c2,... = data.load(filename, sep, skip, colcount, rowcount)

Load ASCII data with:
filename source file name
sep separator, usually tab or semicolon (optional)
skip number of rows to be skipped (optional)
colcount number of columns to load (optional)
rowcount number of rows to load (optional)
The function data.load returns tables containing numeric data c1, c2, ....

rowcount = data.save(filename, format, header, c1, c2, ...)

Save numeric data to ASCII file with:
filename destination file name
format line format (example: "%f\t%f") or separator (usually tab or semicolon).
header file header (comment, labels, ...)
c1, c2, ... tables to save
The function data.save returns the number of rows actually saved rowcount.

Example:

Copy the following script in the editor and click Run (or press F12 on Windows and Linux)

-- ASCII
fname = "C:\\Temp\\ascii.txt"
x = {1, 2, 3, 4, 5}
y = {1, 2, 3, 4, 5}
sep = "\t"
skip = 0
rc = data.save(fname, sep, "# HEADER\n", x, y)
print(rc,"\n")

xt,yt = data.load(fname, sep, skip)
print(xt,"\n")
print(yt)

datayc, status = data.baseline(datay, alambda, itermax, reltol, verbose)

Baseline correction using the algorithm developed by Zhang et al., Analyst, 135(5), 1138-1146.:
datay the table with data
alambda correction parameter (default value: 100)
itermax maximum number of iterations (default value: 10)
reltol relative tolerance to achieve (default value: 0.001)
verbose true to print messages
The function data.baseline returns the corrected data table datayc and a status (true if succeeded).

Example:

-- baseline
cls()

fname = "data.txt"
datax, datay = data.load(fname, "\t")
datayc, status = data.baseline(datay, 50, 100, 1e-3)
fname = "data_corrected.txt"
data.save(fname, "\t", "# baseline-corrected data", datax, datayc)

p = plot.new()
plot.add(p, datax, datay)
plot.add(p, datax, datayc)
plot.update(p)

datayn = data.normalize(datay, anorm)

Normalize data to [0, anorm]:
datay the table with data
anorm maximal value to normalize to
The function data.normalize returns the normalized data table datayn.