using Plots
using Distributions
using StatsBase
pyplot()
plt = plot(legend=true)
α=0.5
β=0.1
N=5*10^6 #Number of sample blocks
X = Float64[] #This array holds all the samples that fulfill the constraints
tmp = Float64[] #temporary
# We sample for different lengths of the sequence and see how the resulting histogram
# approaches an exponential form. Note in particular how for n=1 the distribution is
# simply uniform but cut-off.
for n in [1,2,4,8] #length of the sequence X=(X_1,...,X_n)
for k in 1:N
tmp = rand(n) #draw
if mean(tmp)>=α && (var(tmp)>=β || n==1)
append!(X,tmp)
end
end
# For convinience I used Julias implementation to create a histogram.
# Feel free to write your own!
hist = fit(Histogram, X, nbins=20,closed=:left)
# Plot the histograms for all n in one plot
plot!(plt, midpoints(hist.edges[1]), hist.weights/length(X), alpha=0.6,lw=0.5,marker=:auto,lab="n=$n")
empty!(X)
end
display(plt)
Let's try to find the parameters of the distribution by fitting a curve to the histogram.
Again for convenience, I use an algorithm from the LsqFit
package to perform a least-square fit onto the data. The package uses the Levenberg-Marquardt algorithm, which is a variant of Gauss-Newton. If you like to know more, please consult e.g the corresponding Wikipedia page.
using LsqFit
Increase $n$ quite a bit to produce a result closer to the exponential form.
#This array holds all the samples that fulfill the constraints
α=0.55
β=0.1
N = 10^8 #Number of sample blocks
n = 200
tmp = Vector{Float64}(n) #temporary
Y = Vector{Float64}()
count = 1
@time for k in 1:N
rand!(tmp) #draw
if mean(tmp)>=α && var(tmp)>=β
append!(Y, tmp)
# for c in count:count+n-1
# X[c] = tmp[c-count+1]
# end
# count += n
end
end
# Sample rate
length(Y)/(n*N)
Fit a rather dense ($nBins=100$) histogram
nBins = 100
hist = fit(Histogram, Y, nbins=nBins,closed=:left)
# Convenience function to calculate the midpoints of the histogram bins.
midpts = midpoints(hist.edges[1])
Define the function we want to fit to the data. p
is a vector of parameters to be determined by the fit.
# Function to fit.
exp_func(x,p) = exp(p[1]+x.*p[2]+(x.^2).*p[3])
expFit = curve_fit(exp_func, midpts, hist.weights/length(Y)*nBins, [+1.0,-1.0,1.0])
Here are the parameters $(\lambda_0,\lambda_1,\lambda_2)$ of the fit
expFit.param
Histogram and its fit.
Observe that even though the number of samples was $10^8$, the histogram is still quite noisy. There is a tradeoff between making $n$ large to get close to the theoretical prediction, and keeping the simulation feasible.
plot(midpts, hist.weights/length(Y)*nBins, markercolor=:green, alpha=0.5,lw=0.0,marker=:auto,lab="Histogram")
# bar!(hist.edges[1], hist.weights/length(X)*nBins, alpha=0.3)
plot!(midpts, x->exp_func(x,expFit.param),lab="Fit to histgram", lw=2.)
Let's extract the parameters from the model directly.
Determine the parameters of the model such that the constraints are fulfilled numerically.
It's a system of non-linear equations. Various algorithms to solve such problems exist. If you like to know more about solving & optimization in Julia, have a look at
Here we use the package NLsolve which implements a trust-region algorithm
using NLsolve
Define helper functions. quadgk
is a numerical integration routine. (Gauß-Kronrod-quadrature)
# normalization
Z(p) = quadgk(x->exp(p[1]*x+p[2]*x^2),0.,1.,order=10)[1]
# and its inverse
iZ(p) = 1./Z(p)
# k-th moment of the distribution
mom(k::Int,p;order=10) = quadgk(x->x^k*exp(p[1]*x+p[2]*x^2),0.,1.,order=order)[1]*iZ(p)
Target function
function f!(p,store)
store[1] = mom(1,p) - 0.55 #Constraint on the mean
store[2] = mom(2,p) - mom(1,p)^2 - 0.1 #constraint on the variance
end
Solve the system, arbitrarily starting at [0.0,0.0]
.
opt = nlsolve(f!, [0.,0.], autodiff=true, iterations=1000, show_trace=false, extended_trace=false)
The fit parameters coincide not too badly with this exact solution.
opt.zero, expFit.param
let p = opt.zero
plot!(x->exp(p[1]*x+p[2]*x^2)*iZ(p),lab="Opt", line=(2.0, :dash, :blue))
end