\n",
"Details: Deriving the MAP weights
\n",
"\n",
"\n",
"First we apply Bayes' rule\n",
"\n",
"$$\\begin{align}\n",
"p(\\mathbf{w} | \\{x_n,y_n\\}_{n=1}^N,\\sigma_y^2,\\sigma_{\\mathbf{w}}^2) & = \\frac{ p(\\mathbf{w}| \\sigma_{\\mathbf{w}}^2) p(\\mathbf{y}\\mid\\mathbf{X}, \\mathbf{w}, \\sigma_y^2) }{p(\\mathbf{y}| \\sigma_y^2,\\sigma_{\\mathbf{w}}^2)} \\propto p(\\mathbf{w}| \\sigma_{\\mathbf{w}}^2) p(\\mathbf{y}\\mid\\mathbf{X}, \\mathbf{w}, \\sigma_y^2).\n",
"\\end{align}$$\n",
"\n",
"Next we substitute in for the likelihood and prior,\n",
"\n",
"$$\\begin{align}\n",
"p(\\mathbf{w} | \\{x_n,y_n\\}_{n=1}^N,\\sigma_y^2,\\sigma_{\\mathbf{w}}^2) = \\frac{1}{(2\\pi \\sigma_\\mathbf{w}^2)}\\text{exp}\\big(-\\frac{1}{2\\sigma_\\mathbf{w}^2}\\mathbf{w}^\\top \\mathbf{w} \\big) \\times \\frac{1}{(2\\pi \\sigma_y^2)^{N/2}}\\text{exp}\\big(-\\frac{1}{2\\sigma_y^2}(\\mathbf{y} - \\mathbf{X}\\mathbf{w})^\\top (\\mathbf{y} - \\mathbf{X}\\mathbf{w})\\big).\n",
" \\end{align}$$\n",
"\n",
"Now pulling the prior and likelihood terms into a single exponential we have,\n",
"\n",
"$$\\begin{align}\n",
"p(\\mathbf{w} | \\{x_n,y_n\\}_{n=1}^N,\\sigma_y^2,\\sigma_{\\mathbf{w}}^2) & = \\frac{1}{(2\\pi \\sigma_\\mathbf{w}^2) (2\\pi \\sigma_y^2)^{N/2}}\\text{exp}\\big(-\\frac{1}{2\\sigma_\\mathbf{w}^2}\\mathbf{w}^\\top \\mathbf{w} -\\frac{1}{2\\sigma_y^2}(\\mathbf{y} - \\mathbf{X}\\mathbf{w})^\\top (\\mathbf{y} - \\mathbf{X}\\mathbf{w})\\big)\n",
"\\end{align}$$\n",
"\n",
"Taking logs and combining terms that do not depend on $\\mathbf{w}$ into a constant,\n",
"\n",
"$$\\begin{align}\n",
"\\log p(\\mathbf{w} | \\{x_n,y_n\\}_{n=1}^N,\\sigma_y^2,\\sigma_{\\mathbf{w}}^2) & = -\\frac{1}{2\\sigma_\\mathbf{w}^2}\\mathbf{w}^\\top \\mathbf{w} -\\frac{1}{2\\sigma_y^2}(\\mathbf{y} - \\mathbf{X}\\mathbf{w})^\\top (\\mathbf{y} - \\mathbf{X}\\mathbf{w}) + \\text{const.}.\n",
"\\end{align}$$\n",
"\n",
"Now we see that maximising $p(\\mathbf{w} | \\{x_n,y_n\\}_{n=1}^N,\\sigma_y^2,\\sigma_{\\mathbf{w}}^2)$ is the same as minimising $\\frac{\\sigma_y^2}{\\sigma_\\mathbf{w}^2}\\mathbf{w}^\\top \\mathbf{w} +(\\mathbf{y} - \\mathbf{X}\\mathbf{w})^\\top (\\mathbf{y} - \\mathbf{X}\\mathbf{w})$.\n",
"\n",
"
\n",
" \n",
"