*universal approximators*. Universal approximators are families of simple nonlinear feature transformations whose members can be combined to create artbitrarily complex nonlinearities
like any we would ever expect to find in a supervised or unsupervised learning dataset. Here we will also introduce the three standard types of universal approximators employed in practice today - kernels, neural networks, and trees - of which we will have much more to say of in future Chapters.

InÂ [Â ]:

Complicated mathematical objects - i.e., traditional mathematical functions (curves, manifolds, step functions, etc., in general referred to as *piecewise-continuous functions*) - can be similarly broken down into combinations of simpler elements, i.e., into combinations of simpler mathematical functions. Stated more formally, any (piecewise) continuous mathematical function $h\left(\mathbf{x}\right)$ can be broken down into / approximated by a linear combination of $B$ 'simple' functions $f_1,\,f_2,\,...\,f_B$ in general as

where each of the simpler nonlinear functions on the right hand side can have internal parameters. This is actually a rather old mathematical fact (e.g., with roots in the 1700s) which we call *universal approximation* in the machine learning / deep learning world. These 'simple' functions are called *universal approximators*. Moreover if we use enough of them, and tune their parameters correctly, we can approximate any function we want.

With enough universal approximators, with correctly tuned parameters, we can approximate any piecewise continuous mathematical function $h\left(\mathbf{x}\right)$ as closely as we wish as $h\left(\mathbf{x}\right) = w_0 + f_1\left(\mathbf{x}\right){w}_{1} + f_2\left(\mathbf{x}\right){w}_{2} + \cdots + f_B\left(\mathbf{x}\right)w_B$

InÂ [97]:

Out[97]: