mud package#
Subpackages#
Submodules#
mud.base module#
- class mud.base.BayesProblem(X: ndarray | List, y: ndarray | List, domain: ndarray | List | None = None)[source]#
Bases:
object
Sets up Bayesian Inverse Problem for parameter identification
- Parameters:
X (ndarray) – 2D array containing parameter samples from an initial distribution. Rows represent each sample while columns represent parameter values.
y (ndarray) – array containing push-forward values of parameters samples through the forward model. These samples will form the data-likelihood distribution.
domain (array_like, optional) – 2D Array containing ranges of each parameter value in the parameter space. Note that the number of rows must equal the number of parameters, and the number of columns must always be two, for min/max range.
Examples
>>> from mud.base import BayesProblem >>> import numpy as np >>> from scipy.stats import distributions as ds >>> X = np.random.rand(100,1) >>> num_obs = 50 >>> Y = np.repeat(X, num_obs, 1) >>> y = np.ones(num_obs)*0.5 + np.random.randn(num_obs)*0.05 >>> B = BayesProblem(X, Y, np.array([[0,1]])) >>> B.set_likelihood(ds.norm(loc=y, scale=0.05)) >>> np.round(B.map_point()[0],1) 0.5
- property n_features#
- property n_params#
- property n_samples#
- plot_obs_space(obs_idx=0, ax=None, y_range=None, aff=1000, ll_opts={'color': 'r', 'label': 'Data-Likelihood', 'linestyle': '-', 'linewidth': 4}, pf_opts={'color': 'g', 'label': 'PF of Posterior', 'linestyle': ':', 'linewidth': 4})[source]#
Plot probability distributions defined over observable space.
- plot_param_space(param_idx=0, ax=None, x_range=None, aff=1000, pr_opts={'color': 'b', 'label': 'Prior', 'linestyle': '--', 'linewidth': 4}, ps_opts={'color': 'g', 'label': 'Posterior', 'linestyle': ':', 'linewidth': 4}, map_opts={'color': 'g', 'label': '$\\lambda^\\mathrm{MAP}$'}, true_opts={'color': 'r', 'label': '$\\lambda^{\\dagger}$', 'linestyle': '-.'}, true_val=None)[source]#
Plot probability distributions over parameter space
- class mud.base.DensityProblem(X: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], domain: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, weights: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, normalize: bool = False, pad_domain: float = 0.1)[source]#
Bases:
object
Sets up Data-Consistent Inverse Problem for parameter identification
Data-Consistent inversion is a way to infer most likely model parameters using observed data and predicted data from the model.
- X#
Array containing parameter samples from an initial distribution. Rows represent each sample while columns represent parameter values. If 1 dimensional input is passed, assumed that it represents repeated samples of a 1-dimensional parameter.
- Type:
ArrayLike
- y#
Array containing push-forward values of parameters samples through the forward model. These samples will form the predicted distribution.
- Type:
ArrayLike
- domain#
Array containing ranges of each parameter value in the parameter space. Note that the number of rows must equal the number of parameters, and the number of columns must always be two, for min/max range.
- Type:
ArrayLike
- weights#
Weights to apply to each parameter sample. Either a 1D array of the same length as number of samples or a 2D array if more than one set of weights is to be incorporated. If so the weights will be multiplied and normalized row-wise, so the number of columns must match the number of samples.
- Type:
ArrayLike, optional
Examples
Generate test 1-D parameter estimation problem. Model to produce predicted data is the identity map and observed signal comes from true value plus some random gaussian nose.
See
mud.examples.identity_uniform_1D_density_prob()
for more details>>> from mud.examples.simple import identity_1D_density_prob as I1D
First we set up a well-posed problem. Note the domain we are looking over contains our true value. We take 1000 samples, use 50 observations, assuming a true value of 0.5 populated with gaussian noise \(\mathcal{N}(0,0.5)\). Or initial uniform distribution is taken from a \([0,1]\) range.
>>> D = I1D(1000, 50, 0.5, 0.05, domain=[0,1])
Estimate mud_point -> Note since WME map used, observed implied to be the standard normal distribution and does not have to be set explicitly from observed data set.
>>> np.round(D.mud_point()[0],1) 0.5
Expectation value of r, ratio of observed and predicted distribution, should be near 1 if predictability assumption is satisfied.
>>> np.round(D.expected_ratio(),0) 1.0
Set up ill-posed problem -> Searching out of range of true value
>>> D = I1D(1000, 50, 0.5, 0.05, domain=[0.6,1])
Mud point will be close as we can get within the range we are searching for
>>> np.round(D.mud_point()[0],1) 0.6
Expectation of r is close to zero since predictability assumption violated.
>>> np.round(D.expected_ratio(),1) 0.0
- estimate()[source]#
Estimate
Returns the best estimate for most likely parameter values for the given model data using the data-consistent framework.
- Returns:
mud_point – Maximal Updated Density (MUD) point.
- Return type:
np.ndarray
- expected_ratio()[source]#
Expectation Value of R
Returns the expectation value of the R, the ratio of the observed to the predicted density values.
(1)#\[R = \frac{\pi_{ob}(\lambda)} {\pi_{pred}(\lambda)}\]If the predictability assumption for the data-consistent framework is satisfied, then \(E[R]\approx 1\).
- Returns:
expected_ratio – Value of the E(r). Should be close to 1.0.
- Return type:
- fit(**kwargs)[source]#
Update Initial Distribution
Constructs the updated distribution by fitting observed data to predicted data with:
(2)#\[\pi_{up}(\lambda) = \pi_{in}(\lambda) \frac{\pi_{ob}(Q(\lambda))}{\pi_{pred}(Q(\lambda))}\]Note that if initial, predicted, and observed distributions have not been set before running this method, they will be run with default values. To set specific predicted, observed, or initial distributions use the
set_
methods.- Parameters:
**kwargs (dict, optional) – If specified, optional arguments are passed to the
set_predicted()
call in the case that the predicted distribution has not been set yet.
- mud_point()[source]#
Maximal Updated Density (MUD) Point
Returns the Maximal Updated Density or MUD point as the parameter sample from the initial distribution with the highest update density value:
(3)#\[\lambda^{MUD} := \mathrm{argmax} \pi_{up}(\lambda)\]Note if the updated distribution has not been computed yet, this function will call
fit()
to compute it.- Returns:
mud_point – Maximal Updated Density (MUD) point.
- Return type:
np.ndarray
- property n_features#
- property n_params#
- property n_samples#
- plot_obs_space(obs_idx: int = 0, ax: Axes | None = None, y_range: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, aff=100, ob_opts={'color': 'r', 'label': '$\\pi_\\mathrm{obs}$', 'linestyle': '-'}, pr_opts={'color': 'b', 'label': '$\\pi_\\mathrm{pred}$', 'linestyle': '-'}, pf_opts={'color': 'k', 'label': '$\\pi_\\mathrm{pf-pr}$', 'linestyle': '-.'})[source]#
Plot probability distributions over parameter space
Observed distribution is plotted using the distribution function passed to
set_observed()
(or default). The predicted distribution is plotted using the stored predicted distribution function set inset_predicted()
. The push-forward of the updated distribution is computed as a gkde on the predicted samplesy
as well, but using the product of the update ratio (1) and the initial weights as weights.- Parameters:
obs_idx (int, default=0) – Index of observable value to plot.
ax (
matplotlib.axes.Axes
, optional) – Axes to plot distributions on. If non specified, a figure will be initialized to plot on.y_range (list or np.ndarray, optional) – Range over parameter value to plot over.
aff (int, default=100) – Number of points to plot within x_range, evenly spaced.
ob_opts (dict, optional) – Plotting option for observed distribution line. Defaults to
{'color':'r', 'linestyle':'-','linewidth':4, 'label':'Observed'}
. To suppress plotting, pass inNone
.pr_opts (dict, optional) – Plotting option for predicted distribution line. Defaults to
{'color':'b', 'linestyle':'--','linewidth':4, 'label':'PF of Initial'}
. To suppress plotting, pass inNone
.pf_opts (dict, optional) – Plotting option for push-forward of updated distribution line. Defaults to
{'color':'k', 'linestyle':'-.','linewidth':4, 'label':'PF of Updated'}
. To suppress plotting, pass inNone
.
- plot_param_space(param_idx: int = 0, true_val: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, ax: Axes | None = None, x_range: list | ndarray | None = None, ylim: float | None = None, pad_ratio: float = 0.05, aff: int = 100, in_opts={'color': 'b', 'label': '$\\pi_\\mathrm{init}$', 'linestyle': '-'}, up_opts={'color': 'k', 'label': '$\\pi_\\mathrm{update}$', 'linestyle': '-.'}, win_opts=None, mud_opts={'color': 'g', 'label': '$\\lambda^\\mathrm{MUD}$'}, true_opts={'color': 'r', 'label': '$\\lambda^{\\dagger}$'})[source]#
Plot probability distributions over parameter space
Initial distribution is plotted using the distribution function passed to
set_initial()
. The updated distribution is plotted using a weighted gaussian kernel density estimate (gkde) on the initial samples, using the product of the update ratio (1) value times the initial weights as weights for the gkde. The weighted initial is built using a weighted gkde on the initial samples, but only using the initial weights.- Parameters:
param_idx (int, default=0) – Index of parameter value to plot.
ax (
matplotlib.axes.Axes
, optional) – Axes to plot distributions on. If non specified, a figure will be initialized to plot on.x_range (list or np.ndarray, optional) – Range over parameter value to plot over.
aff (int, default=100) – Number of points to plot within x_range, evenly spaced.
in_opts (dict, optional) – Plotting option for initial distribution line. Defaults to
{'color':'b', 'linestyle':'--','linewidth':4, 'label':'Initial'}
. To suppress plotting, pass inNone
explicitly.up_opts (dict, optional) – Plotting option for updated distribution line. Defaults to
{'color':'k', 'linestyle':'-.','linewidth':4, 'label':'Updated'}
. To suppress plotting, pass inNone
explicitly.win_opts (dict, optional) – Plotting option for weighted initial distribution line. Defaults to
{'color':'g', 'linestyle':'--','linewidth':4, 'label':'Weighted Initial'}
. To suppress plotting, pass inNone
explicitly.
- plot_params_2d(x_1: int = 0, x_2: int = 1, y: int = 0, contours: bool = False, colorbar: bool = True, ax: Axes | None = None, label=True, **kwargs)[source]#
2D plot over two indices of param space, optionally contoured by a y value.
- Parameters:
x_1 (int, default=0) – Index of param value (column of X) to use as x value in plot.
x_2 (int, default=1) – Index of param value (column of X) to use as y value in plot.
y (int, optional) – Index of observable (column of y) to use as z value in contour plot.
ax (
matplotlib.axes.Axes
, optional) – Axes to plot distributions on. If non specified, a figure will be initialized to plot on.kwargs (dict, optional) – Additional keyword arguments will be passed to matplotlib.pyplot.scatter.
- plot_qoi(idx_x: int = 0, idx_y: int = 1, ax: Axes | None = None, **kwargs)[source]#
Plot 2D plot over two indices of y space.
- Parameters:
idx_x (int, default=0) – Index of observable value (column of y) to use as x value in plot.
idx_y (int, default=1) – Index of observable value (column of y) to use as y value in plot.
ax (
matplotlib.axes.Axes
, optional) – Axes to plot distributions on. If non specified, a figure will be initialized to plot on.kwargs (dict, optional) – Additional keyword arguments will be passed to matplotlib.pyplot.scatter.
- set_initial(distribution: rv_continuous | None = None)[source]#
Set initial probability distribution of model parameter values \(\pi_{in}(\lambda)\).
- Parameters:
distribution (scipy.stats.rv_continuous, optional) – scipy.stats continuous distribution object from where initial parameter samples were drawn from. If none provided, then a uniform distribution over domain of the density problem is assumed. If no domain is specified for density, then a standard normal distribution \(N(0,1)\) is assumed.
Warning
Setting initial distribution resets the predicted and updated distributions, so make sure to set the initial first.
- set_observed(distribution: ~scipy.stats._distn_infrastructure.rv_continuous = <scipy.stats._distn_infrastructure.rv_continuous_frozen object>)[source]#
Set distribution for the observed data.
The observed distribution is determined from assumptions on the collected data. In the case of using a weighted mean error map on sequential data from a single output, the distribution is stationary with respect to the number data points collected and will always be the standard normal d distribution $N(0,1)$.
- Parameters:
distribution (scipy.stats.rv_continuous, default=scipy.stats.norm()) – scipy.stats continuous distribution like object representing the likelihood of observed data. Defaults to a standard normal distribution N(0,1).
- set_predicted(distribution: rv_continuous | None = None, bw_method: str | Callable | generic | None = None, weights: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, **kwargs)[source]#
Set Predicted Distribution
The predicted distribution over the observable space is equal to the push-forward of the initial through the model \(\pi_{pr}(Q(\lambda)\). If no distribution is passed,
scipy.stats.gaussian_kde
is used over the predicted valuesy
to estimate the predicted distribution.- Parameters:
distribution (
scipy.stats.rv_continuous
, optional) – If specified, used as the predicted distribution instead of the default of using gaussian kernel density estimation on observed values y. This should be a frozen distribution if using scipy, and otherwise be a class containing a pdf() method return the probability density value for an array of values.bw_method (str, scalar, or Callable, optional) – Method to use to calculate estimator bandwidth. Only used if distribution is not specified, See documentation for
scipy.stats.gaussian_kde
for more information.weights (ArrayLike, optional) – Weights to use on predicted samples. Note that if specified,
set_weights()
will be run first to calculate new weights. Otherwise, whatever was previously set as the weights is used. Note this defaults to a weights vector of all 1s for every sample in the case that no weights were passed on upon initialization.**kwargs (dict, optional) – If specified, any extra keyword arguments will be passed along to the passed
distribution.pdf()
function for computing values of predicted samples.Note (distribution should be a frozen distribution if using scipy.) –
Warning
If passing a distribution argument, make sure that the initial distribution has been set first, either by having run
set_initial()
orfit()
first.
- set_weights(weights: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, normalize: bool = False)[source]#
Set Sample Weights
Sets the weights to use for each sample. Note weights can be one or two dimensional. If weights are two dimensional the weights are combined by multiplying them row wise and normalizing, to give one weight per sample. This combining of weights allows incorporating multiple sets of weights from different sources of prior belief.
- Parameters:
weights (np.ndarray, List) – Numpy array or list of same length as the n_samples or if two dimensional, number of columns should match n_samples
normalize (bool, default=False) – Whether to normalize the weights vector.
Warning
Resetting weights will delete the predicted and updated distribution values in the class, requiring a re-run of adequate set_ methods and/or fit() to reproduce with new weights.
- class mud.base.IterativeLinearProblem(A, b, y=None, mu_i=None, cov=None, data_cov=None, idx_order=None)[source]#
Bases:
LinearGaussianProblem
- plot_chain(ref_param, ax=None, color='k', s=100, **kwargs)[source]#
Plot chain of solutions and contours
- class mud.base.LinearGaussianProblem(A=array([[1], [1]]), b=None, y=None, mean_i=None, cov_i=None, cov_o=None, alpha=1.0)[source]#
Bases:
object
Sets up inverse problems with Linear/Affine Maps
Class provides solutions using MAP, MUD, and least squares solutions to the linear (or affine) problem from p parameters to d observables.
(4)#\[M(\mathbf{x}) = A\mathbf{x} + \mathbf{b}, A \in \mathbb{R}^{d\times p}, \mathbf{x}, \in \mathbb{R}^{p}, \mathbf{b}, \in \mathbb{R}^{d},\]- A#
2D Array defining linear transformation from model parameter space to model output space.
- Type:
ArrayLike
- y#
1D Array containing observed values of Q(lambda) Array containing push-forward values of parameters samples through the forward model. These samples will form the predicted distribution.
- Type:
ArrayLike
- domain#
Array containing ranges of each parameter value in the parameter space. Note that the number of rows must equal the number of parameters, and the number of columns must always be two, for min/max range.
- Type:
ArrayLike
- weights#
Weights to apply to each parameter sample. Either a 1D array of the same length as number of samples or a 2D array if more than one set of weights is to be incorporated. If so the weights will be multiplied and normalized row-wise, so the number of columns must match the number of samples.
- Type:
ArrayLike, optional
Examples
Problem set-up:
\[\begin{split}A = \begin{bmatrix} 1 & 1 \end{bmatrix}, b = 0, y = 1 \lambda_0 = \begin{bmatrix} 0.25 & 0.25 \end{bmatrix}^T, \Sigma_{init} = \begin{bmatrix} 1 & -0.25 \\ -0.25 & 0.5 \end{bmatrix}, \Sigma_{obs} = \begin{bmatrix} 0.25 \end{bmatrix}\end{split}\]>>> from mud.base import LinearGaussianProblem as LGP >>> lg1 = LGP(A=np.array([[1, 1]]), ... b=np.array([[0]]), ... y=np.array([[1]]), ... mean_i=np.array([[0.25, 0.25]]).T, ... cov_i=np.array([[1, -0.25], [-0.25, 0.5]]), ... cov_o=np.array([[1]])) >>> lg1.solve('mud') array([[0.625], [0.375]])
- compute_functionals(X, terms='all')[source]#
For a given input and observed data, compute functionals or individual terms in functionals that are minimized to solve the linear gaussian problem.
- property n_features#
- property n_params#
- property n_samples#
- plot_contours(ref=None, subset=None, ax=None, annotate=False, note_loc=None, w=1, label='{i}', plot_opts={'color': 'k', 'fs': 20, 'ls': ':', 'lw': 1}, annotate_opts={'fontsize': 20})[source]#
Plot Linear Map Solution Contours
- plot_fun_contours(mesh=None, terms='dc', ax=None, N=250, r=1, **kwargs)[source]#
Plot contour map of functionals being minimized over input space
- plot_sol(point='mud', ax=None, label=None, note_loc=None, pt_opts={'color': 'k', 'marker': 'o', 's': 100}, ln_opts={'color': 'xkcd:blue', 'lw': 1, 'marker': 'd', 'zorder': 10}, annotate_opts={'backgroundcolor': 'w', 'fontsize': 14})[source]#
Plot solution points
- updated_cov(A=None, init_cov=None, data_cov=None)[source]#
We start with the posterior covariance from ridge regression Our matrix R = init_cov^(-1) - X.T @ pred_cov^(-1) @ X replaces the init_cov from the posterior covariance equation. Simplifying, this is given as the following, which is not used due to issues of numerical stability (a lot of inverse operations).
up_cov = (X.T @ np.linalg.inv(data_cov) @ X + R )^(-1) up_cov = np.linalg.inv( X.T@(np.linalg.inv(data_cov) - inv_pred_cov)@X + np.linalg.inv(init_cov) )
We return the updated covariance using a form of it derived which applies Hua’s identity in order to use Woodbury’s identity.
Check using alternate for updated_covariance.
>>> from mud.base import LinearGaussianProblem as LGP >>> lg2 = LGP(A=np.eye(2)) >>> lg2.updated_cov() array([[1., 0.], [0., 1.]]) >>> lg3 = LGP(A=np.eye(2)*2) >>> lg3.updated_cov() array([[0.25, 0. ], [0. , 0.25]]) >>> lg3 = LGP(A=np.eye(3)[:, :2]*2) >>> lg3.updated_cov() array([[0.25, 0. ], [0. , 0.25]]) >>> lg3 = LGP(A=np.eye(3)[:, :2]*2, cov_i=np.eye(2)) >>> lg3.updated_cov() array([[0.25, 0. ], [0. , 0.25]])
- class mud.base.LinearWMEProblem(operators, data, sigma, y=None, mean_i=None, cov_i=None, cov_o=None, alpha=1.0)[source]#
Bases:
LinearGaussianProblem
Linear Inverse Problems using the Weighted Mean Error Map
- class mud.base.SpatioTemporalProblem(df=None)[source]#
Bases:
object
Class for parameter estimation problems related to spatio-temporal problems. equation models of real world systems. Uses a QoI map of weighted residuals between simulated data and measurements to do inversion
- TODO#
- Type:
Finish
- TODO: Finish
- property data#
- property domain#
- get_closest_to_measurements(samples_mask=None, times_mask=None, sensors_mask=None)[source]#
Get closest simulated data point to measured data in $l^2$-norm.
- get_closest_to_true_vals()[source]#
Get closest simulated data point to noiseless true values in $l^2$-norm.
Note for now no sub-sampling implemented here.
- property lam#
- property lam_ref#
- load(df, lam='lam', data='data', **kwargs)[source]#
Load data from a file on disk for a PDE parameter estimation problem.
- property measurements#
- measurements_from_reference(ref=None, std_dev=None, seed=None)[source]#
Add noise to a reference solution.
- mud_problem(method='pca', data_weights=None, sample_weights=None, num_components=2, samples_mask=None, times_mask=None, sensors_mask=None)[source]#
Build QoI Map Using Data and Measurements
- plot_ts(ax=None, samples=None, times=None, sensor_idx=0, max_plot=100, meas_kwargs={}, samples_kwargs={}, alpha=0.1)[source]#
Plot time series data
- property sample_dist#
- sensor_contour_plot(idx=0, c_vals=None, ax=None, mask=None, fill=True, colorbar=True, **kwargs)[source]#
Plot locations of sensors in space
- sensor_scatter_plot(ax=None, mask=None, colorbar=None, **kwargs)[source]#
Plot locations of sensors in space
- property true_vals#
mud.cli module#
mud.funs module#
Python console script for mud, installed with pip install . or python setup.py install
- mud.funs.data_prob(lam, qoi, qoi_true=None, measurements=None, std_dev=None, sample_dist='uniform', domain=None, lam_ref=None, times=None, sensors=None, idxs=None, method='wme', init_dist=<scipy.stats._distn_infrastructure.rv_continuous_frozen object>)[source]#
Data-Constructed Map Solve
Wrapper around SpatioTemporalProblem class to create and solve a MUD problem by first aggregating observed and siumalated data in a data-constructed qoi map.
- mud.funs.iter_lin_solve(A, b, y=None, mean=None, cov=None, data_cov=None, method='mud', num_epochs=1, idx_order=None)[source]#
Iterative Linear Gaussian Problem Solver Entrypoint
- mud.funs.iterate(A, b, y, initial_mean, initial_cov, data_cov=None, num_epochs=1, idx=None)[source]#
- mud.funs.lin_prob(A, b, y=None, mean=None, cov=None, data_cov=None, alpha=None)[source]#
Linear Gaussian Problem Solver Entrypoint
- mud.funs.map_problem(lam, qoi, qoi_true, domain, sd=0.05, num_obs=None, log=False)[source]#
Wrapper around map problem, takes in raw qoi + synthetic data and instantiates solver object
- mud.funs.map_sol(A, b, y=None, mean=None, cov=None, data_cov=None, w=1)[source]#
MAP Linear Gaussian Problem Solve
- mud.funs.map_sol_with_cov(A, b, y=None, mean=None, cov=None, data_cov=None, w=1)[source]#
MAP Linear Gaussian Problem Solve
- mud.funs.mud_problem(lam, qoi, qoi_true, domain, sd=0.05, num_obs=None, split=None, weights=None)[source]#
Wrapper around mud problem, takes in raw qoi + synthetic data and performs WME transformation, instantiates solver object.
- mud.funs.mud_sol(A, b, y=None, mean=None, cov=None, data_cov=None)[source]#
For SWE problem, we are inverting N(0,1). This is the default value for data_cov.
- mud.funs.mud_sol_with_cov(A, b, y=None, mean=None, cov=None, data_cov=None)[source]#
Doesn’t use R directly, uses new equations. This presents the equation as a rank-k update to the error of the initial estimate.
- mud.funs.updated_cov(X, init_cov=None, data_cov=None)[source]#
We start with the posterior covariance from ridge regression Our matrix R = init_cov^(-1) - X.T @ pred_cov^(-1) @ X replaces the init_cov from the posterior covariance equation. Simplifying, this is given as the following, which is not used due to issues of numerical stability (a lot of inverse operations).
up_cov = (X.T @ np.linalg.inv(data_cov) @ X + R )^(-1) up_cov = np.linalg.inv( X.T@(np.linalg.inv(data_cov) - inv_pred_cov)@X + np.linalg.inv(init_cov) )
We return the updated covariance using a form of it derived which applies Hua’s identity in order to use Woodbury’s identity.
>>> updated_cov(np.eye(2)) array([[1., 0.], [0., 1.]]) >>> updated_cov(np.eye(2)*2) array([[0.25, 0. ], [0. , 0.25]]) >>> updated_cov(np.eye(3)[:, :2]*2, data_cov=np.eye(3)) array([[0.25, 0. ], [0. , 0.25]]) >>> updated_cov(np.eye(3)[:, :2]*2, init_cov=np.eye(2)) array([[0.25, 0. ], [0. , 0.25]])
- mud.funs.wme(predictions, data, sd=None)[source]#
Calculates Weighted Mean Error (WME) functional.
- Parameters:
predictions (numpy.ndarray of shape (n_samples, n_features)) – Predicted values against which data is compared.
data (list or numpy.ndarray of shape (n_features, 1)) – Collected (noisy) data
sd (float, optional) – Standard deviation
- Return type:
numpy.ndarray of shape (n_samples, 1)
mud.norm module#
- mud.norm.full_functional(operator, inputs, data, initial_mean, initial_cov, observed_mean=0, observed_cov=1)[source]#
- mud.norm.inner_product(X, mat)[source]#
Inner-product induced vector norm implementation.
Returns square of norm defined by the inner product
(x,x)_C := x^T C^-1 x
- Parameters:
X ((M, N) array_like) – Input array. N = number of samples, M = dimension
mat ((M, M) array_like) – Positive-definite operator which induces the inner product
- Returns:
Z – inner-product of each column in
X
with respect tomat
- Return type:
(N, 1) ndarray
mud.plot module#
MUD Plotting Module
Plotting utility functions for visualizing data-sets and distributions related to algorithm implemented within the MUD library.
Functions#
- mud.plot.build_nd_mesh_grid(domain, aff=100)[source]#
Build n-dimensional mesh grid
- Parameters:
- Returns:
(mesh, grid_points) – Tuple consistent of n-dimensional grid of points mesh and the list of coordinates $(x_1, x_2, …, x_n)$ for each grid point along.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]
Examples
Building 1d ``mesh”
>>> mesh_1d, pts_1d = build_nd_mesh_grid([[0,1]], aff=5) >>> mesh_1d [array([0. , 0.25, 0.5 , 0.75])] >>> pts_1d array([[0. , 0.25, 0.5 , 0.75]])
- mud.plot.plot_1D_vecs(vecs, markers=None, ax=None, label=True, **kwargs)[source]#
Plot components of 1D vectors
- mud.plot.plot_contours(A, ref_param, subset=None, color='k', ls=':', lw=1, fs=20, w=1, s=100, **kwds)[source]#
- mud.plot.plot_dist(dist, domain, ax=None, idx=0, source='kde', aff=100, **kwargs)[source]#
Plot a probability distribution over a given domain.
- mud.plot.plot_vert_line(ax, x_loc, ylim=None, **kwargs)[source]#
Plot a vertical line on an existing axis at x_loc
mud.preprocessing module#
MUD Pre-Processing Module
All functions for pre-processing QoI data-sets before applying inversion algorithms can be found in this module.
Functions#
pca - Applly Principle Component Analysis transformation to QoI data.
- mud.preprocessing.pca(data: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], n_components: int = 2, **kwargs) Tuple[PCA, ndarray] [source]#
Apply Principal Component Analysis
Uses
sklearn.decomposition.PCA
class to perform a truncated PCA transformation on inputdata
using the firstn_components
principle components. Notesklearn.preprocessing.StandardScaler
transformation is applied to the data first.- Parameters:
ds (
numpy.typing.ArrayLike
) – Data to apply PCA transformation to. Must be 2 dimensional.n_components (int, default=2) – Number of principal components to use.
kwargs (dict, optional) – Additional keyword arguments will be passed to
sklearn.decomposition.PCA
class constructor. See sklearn’s documentation for more information on how PCA is performed.
- Returns:
pca_res – Tuple of
(pca, X_train)
wherepca
is thesklearn.decomposition.PCA
class with principle component vectors accessible atpca.components_
andX_train
being the transformed data-set, which should have same number of rows as originaldata
, but now onlyn_components
columns.- Return type:
Examples
For a simple example lets apply the PCA transformation to the identity matrix in 2 dimensions, using first 1 principle component.
>>> data = np.eye(2) >>> pca_1, X_train_1 = pca(data, n_components=1) >>> np.around(X_train_1, decimals=1) array([[-1.4], [ 1.4]]) >>> np.around(pca_1.components_, decimals=1) array([[-0.7, 0.7]])
Now lets try using two components
>>> pca_2, X_train_2 = pca(data, n_components=2) >>> np.around(X_train_2, decimals=1) array([[-1.4, 0. ], [ 1.4, 0. ]]) >>> np.abs(np.around(pca_2.components_, decimals=1)) array([[0.7, 0.7], [0.7, 0.7]])
Note that if we have three dimensional data we must flatten it before sending using
pca()
>>> data = np.random.rand(2,2,2) >>> pca, X_train = pca(data) Traceback (most recent call last): ... ValueError: Data is 3 dimensional. Must be 2D
Assuming the first dimension indicates each sample, and each sample contains 2D data within the 2nd and 3rd dimensions of the of the data set, then we can flatten this 2D data into a vector and then perform the PCA transformation.
>>> data = np.reshape(data, (2,-1)) >>> pca, X_train = pca(data) >>> X_train.shape (2, 2)
- mud.preprocessing.svd(data: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], **kwargs) Tuple[ndarray, ndarray, ndarray] [source]#
Apply Singular Value Decomposition
Uses
np.linalg.svd
class to perform an SVD transformation on inputdata
. Notesklearn.preprocessing.StandardScaler
transformation is applied to the data first.- Parameters:
ds (
numpy.typing.ArrayLike
) – Data to apply SVD transformation to. Must be 2 dimensional.kwargs (dict, optional) – Additional keyword arguments will be passed to
np.linalg.svd
method.
- Returns:
svd_res – Tuple of
(U, singular_values, singular_vectors)
corresponding to the X = Sigma UV^T decomposition elements.- Return type:
Tuple[
numpy.ndarray
,]
Examples
For a simple example lets apply the PCA transformation to the identity matrix in 2 dimensions, using first 1 principle component.
>>> data = np.eye(2) >>> U, S, V = svd(data) >>> np.around(U, decimals=1) array([[-0.7, 0.7], [ 0.7, 0.7]]) >>> np.around(S, decimals=1) array([2., 0.])
Note that if we have three dimensional data we must flatten it before sending using
pca()
>>> data = np.random.rand(2,2,2) >>> U, S, V = svd(data) Traceback (most recent call last): ... ValueError: Data is 3 dimensional. Must be 2D
Assuming the first dimension indicates each sample, and each sample contains 2D data within the 2nd and 3rd dimensions of the of the data set, then we can flatten this 2D data into a vector and then perform
svd()
.>>> data = np.reshape(data, (2,-1)) >>> U, S, V = svd(data) >>> U.shape (2, 2)
mud.util module#
- mud.util.add_noise(signal: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], sd: float = 0.05, seed: int | None = None)[source]#
Add Noise
Add noise to synthetic signal to model a real measurement device. Noise is assumed to be from a standard normal distribution std deviation sd:
$mathcal{N}(0,sigma)$
Paramaters#
- signalnumpy.typing.ArrayLike
Signal to add noise to.
- sdfloat, default = 0.05
Standard deviation of error to add.
- seedint, optional
Seed to use for numpy random number generator.
- returns:
noisy_signal (numpy.typing.ArrayLike) – Signal with noise added to it.
Example Usage
————-
Generate test signal, add noise, check average distance
>>> seed = 21
>>> test_signal = np.ones(5)
>>> noisy_signal = add_noise(test_signal, sd=0.05, seed=21)
>>> np.round(1000*np.mean(noisy_signal-test_signal))
4.0
- mud.util.fit_domain(x: ndarray | None = None, min_max_bounds: ndarray | None = None, pad_ratio: float = 0.1) ndarray [source]#
Fit domain bounding box to array x
- Parameters:
x (ArrayLike) – 2D array to calculate min, max values along columns.
pad_ratio (float, default=0.1) – What ratio of total range=max-min to subtract/add to min/max values to construct final domain range. Padding is done per x column dimension.
- Returns:
min_max_bounds – Domain fitted to values found in 2D array x, with padding added.
- Return type:
ArrayLike
Examples
Input must be 2D. Set pad_ratio = 0 to get explicit min/max bounds >>> fit_domain(np.array([[1, 10], [0, -10]]), pad_ratio=0.0) array([[ 0, 1],
[-10, 10]])
Can extend domain around the array values using the pad_ratio argument.
>>> fit_domain(np.array([[1, 10], [0, -10]]), pad_ratio=1) array([[ -1, 2], [-30, 30]])
- mud.util.make_2d_unit_mesh(N: int = 50, window: int = 1)[source]#
Make 2D Unit Mesh
Constructs mesh based on uniform distribution to discretize each axis.
- Parameters:
- Returns:
grid (tuple of np.ndarray) – Tuple of (X, Y, XX), the grid X and Y and 2D mesh XX
Example Usage
————-
>>> x, y, XX = make_2d_unit_mesh(3)
>>> print(XX)
[[0. 0. ] – [0.5 0. ] [1. 0. ] [0. 0.5] [0.5 0.5] [1. 0.5] [0. 1. ] [0.5 1. ] [1. 1. ]]
- mud.util.null_space(A, rcond=None)[source]#
Construct an orthonormal basis for the null space of A using SVD
Method is slight modification of
scipy.linalg
- Parameters:
A ((M, N) array_like) – Input array
rcond (float, optional) – Relative condition number. Singular values
s
smaller thanrcond * max(s)
are considered zero. Default: floating point eps * max(M,N).
- Returns:
Z – Orthonormal basis for the null space of A. K = dimension of effective null space, as determined by rcond
- Return type:
(N, K) ndarray
Examples
One-dimensional null space:
>>> import numpy as np >>> A = np.array([[1, 1], [1, 1]]) >>> ns = null_space(A) >>> ns * np.sign(ns[0,0]) # Remove the sign ambiguity of the vector array([[ 0.70710678], [-0.70710678]])
Two-dimensional null space:
>>> B = np.random.rand(3, 5) >>> Z = null_space(B) >>> Z.shape (5, 2) >>> np.allclose(B.dot(Z), 0) True
The basis vectors are orthonormal (up to rounding error):
>>> np.allclose(Z.T.dot(Z), np.eye(2)) True
- mud.util.rank_decomposition(A: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) List[ndarray] [source]#
Build list of rank k updates of A
- mud.util.set_shape(array: ndarray, shape: List | Tuple = (1, -1)) ndarray [source]#
Resizes inputs if they are one-dimensional.
- mud.util.std_from_equipment(tolerance=0.1, probability=0.95)[source]#
Converts tolerance tolerance for precision of measurement equipment to a standard deviation, scaling so that (100`probability`) percent of measurements are within tolerance. A mean of zero is assumed. erfinv is imported from scipy.special
- mud.util.transform_linear_map(operator, data, std)[source]#
Takes a linear map operator of size (len(data), dim_input) or (1, dim_input) for repeated observations, along with a vector data representing observations. It is assumed that data is formed with M@truth + sigma where sigma ~ N(0, std)
This then transforms it to the MWE form expected by the DCI framework. It returns a matrix A of shape (1, dim_input) and np.float b and transforms it to the MWE form expected by the DCI framework.
>>> X = np.ones((10, 2)) >>> x = np.array([0.5, 0.5]).reshape(-1, 1) >>> std = 1 >>> d = X @ x >>> A, b = transform_linear_map(X, d, std) >>> np.linalg.norm(A @ x + b) 0.0 >>> A, b = transform_linear_map(X, d, [std]*10) >>> np.linalg.norm(A @ x + b) 0.0 >>> A, b = transform_linear_map(np.array([[1, 1]]), d, std) >>> np.linalg.norm(A @ x + b) 0.0 >>> A, b = transform_linear_map(np.array([[1, 1]]), d, [std]*10) Traceback (most recent call last): ... ValueError: For repeated measurements, pass a float for std