{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Foundations for statistical inference - Confidence intervals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### If you have access to data on an entire population, say the size of every house in Ames, Iowa, it's straight forward to answer questions like, \"How big is the typical house in Ames?\" and \"How much variation is there in sizes of houses?\". If you have access to only a sample of the population, as is often the case, the task becomes more complicated. What is your best guess for the typical size if you only know the sizes of several dozen houses? This sort of situation requires that you use your sample to make inference on what your population looks like."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In the previous lab, \"Sampling Distributions\", we looked at the population data of houses from Ames, Iowa. Let's start by loading that data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import io\n",
    "import requests\n",
    "\n",
    "df_url = 'https://raw.githubusercontent.com/akmand/datasets/master/openintro/ames.csv'\n",
    "url_content = requests.get(df_url, verify=False).content\n",
    "ames = pd.read_csv(io.StringIO(url_content.decode('utf-8')))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Order</th>\n",
       "      <th>PID</th>\n",
       "      <th>MS.SubClass</th>\n",
       "      <th>MS.Zoning</th>\n",
       "      <th>Lot.Frontage</th>\n",
       "      <th>Lot.Area</th>\n",
       "      <th>Street</th>\n",
       "      <th>Alley</th>\n",
       "      <th>Lot.Shape</th>\n",
       "      <th>Land.Contour</th>\n",
       "      <th>...</th>\n",
       "      <th>Pool.Area</th>\n",
       "      <th>Pool.QC</th>\n",
       "      <th>Fence</th>\n",
       "      <th>Misc.Feature</th>\n",
       "      <th>Misc.Val</th>\n",
       "      <th>Mo.Sold</th>\n",
       "      <th>Yr.Sold</th>\n",
       "      <th>Sale.Type</th>\n",
       "      <th>Sale.Condition</th>\n",
       "      <th>SalePrice</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>526301100</td>\n",
       "      <td>20</td>\n",
       "      <td>RL</td>\n",
       "      <td>141.0</td>\n",
       "      <td>31770</td>\n",
       "      <td>Pave</td>\n",
       "      <td>NaN</td>\n",
       "      <td>IR1</td>\n",
       "      <td>Lvl</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>2010</td>\n",
       "      <td>WD</td>\n",
       "      <td>Normal</td>\n",
       "      <td>215000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>526350040</td>\n",
       "      <td>20</td>\n",
       "      <td>RH</td>\n",
       "      <td>80.0</td>\n",
       "      <td>11622</td>\n",
       "      <td>Pave</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Reg</td>\n",
       "      <td>Lvl</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>MnPrv</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>2010</td>\n",
       "      <td>WD</td>\n",
       "      <td>Normal</td>\n",
       "      <td>105000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>526351010</td>\n",
       "      <td>20</td>\n",
       "      <td>RL</td>\n",
       "      <td>81.0</td>\n",
       "      <td>14267</td>\n",
       "      <td>Pave</td>\n",
       "      <td>NaN</td>\n",
       "      <td>IR1</td>\n",
       "      <td>Lvl</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Gar2</td>\n",
       "      <td>12500</td>\n",
       "      <td>6</td>\n",
       "      <td>2010</td>\n",
       "      <td>WD</td>\n",
       "      <td>Normal</td>\n",
       "      <td>172000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>526353030</td>\n",
       "      <td>20</td>\n",
       "      <td>RL</td>\n",
       "      <td>93.0</td>\n",
       "      <td>11160</td>\n",
       "      <td>Pave</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Reg</td>\n",
       "      <td>Lvl</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>2010</td>\n",
       "      <td>WD</td>\n",
       "      <td>Normal</td>\n",
       "      <td>244000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>527105010</td>\n",
       "      <td>60</td>\n",
       "      <td>RL</td>\n",
       "      <td>74.0</td>\n",
       "      <td>13830</td>\n",
       "      <td>Pave</td>\n",
       "      <td>NaN</td>\n",
       "      <td>IR1</td>\n",
       "      <td>Lvl</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>MnPrv</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>2010</td>\n",
       "      <td>WD</td>\n",
       "      <td>Normal</td>\n",
       "      <td>189900</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 82 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Order        PID  MS.SubClass MS.Zoning  Lot.Frontage  Lot.Area Street  \\\n",
       "0      1  526301100           20        RL         141.0     31770   Pave   \n",
       "1      2  526350040           20        RH          80.0     11622   Pave   \n",
       "2      3  526351010           20        RL          81.0     14267   Pave   \n",
       "3      4  526353030           20        RL          93.0     11160   Pave   \n",
       "4      5  527105010           60        RL          74.0     13830   Pave   \n",
       "\n",
       "  Alley Lot.Shape Land.Contour  ... Pool.Area Pool.QC  Fence Misc.Feature  \\\n",
       "0   NaN       IR1          Lvl  ...         0     NaN    NaN          NaN   \n",
       "1   NaN       Reg          Lvl  ...         0     NaN  MnPrv          NaN   \n",
       "2   NaN       IR1          Lvl  ...         0     NaN    NaN         Gar2   \n",
       "3   NaN       Reg          Lvl  ...         0     NaN    NaN          NaN   \n",
       "4   NaN       IR1          Lvl  ...         0     NaN  MnPrv          NaN   \n",
       "\n",
       "  Misc.Val Mo.Sold Yr.Sold Sale.Type  Sale.Condition  SalePrice  \n",
       "0        0       5    2010       WD           Normal     215000  \n",
       "1        0       6    2010       WD           Normal     105000  \n",
       "2    12500       6    2010       WD           Normal     172000  \n",
       "3        0       4    2010       WD           Normal     244000  \n",
       "4        0       3    2010       WD           Normal     189900  \n",
       "\n",
       "[5 rows x 82 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In this lab we'll start with a simple random sample of size 60 from the population."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "population = ames['Gr.Liv.Area']\n",
    "samp = population.sample(60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    2930.000000\n",
       "mean     1499.690444\n",
       "std       505.508887\n",
       "min       334.000000\n",
       "25%      1126.000000\n",
       "50%      1442.000000\n",
       "75%      1742.750000\n",
       "max      5642.000000\n",
       "Name: Gr.Liv.Area, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "population.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1494.9833333333333"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "samp.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Based only on this single sample, the best estimate of the average living area of houses sold in Ames would be the sample mean, usually denoted as (here we're calling it sample_mean). That serves as a good point estimate but it would be useful to also communicate how uncertain we are of that estimate. This can be captured by using a confidence interval.\n",
    "\n",
    "### We can calculate a 95% confidence interval for a sample mean by adding and subtracting 1.96 standard errors to the point estimate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1376.238157814993 1613.7285088516737\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "se = np.std(samp)/np.sqrt(60)\n",
    "lower = samp.mean() - (1.96 * se)\n",
    "upper = samp.mean() + (1.96 * se)\n",
    "print(lower, upper)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The distribution of sample mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlYAAAEvCAYAAACHYI+LAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAATLElEQVR4nO3df4zk933X8dfbe8Q0W3dZ2+fYtS3fuboGHCTa3GFSqlalLrIpKGd+GB2i5YSCDJWT1ohSxVSqIyFLEYLiIpEiNzEcIsS9hAifoIUGQ0FIKO4lcUhsx/Lh84+LvfY2t0yrsXTO7X34Y79pxmbPt/F+9mZ2/XhIq5n5zndm3+OPxnrezHdnqrUWAAA275JpDwAAsFMIKwCAToQVAEAnwgoAoBNhBQDQibACAOhk17QHSJIrr7yy7dmzZ9pjAABc0Be+8IXfba3tXu+6mQirPXv25Pjx49MeAwDggqrqufNd561AAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCcz8ZU2wGwZjUYZj8fTHmPT5ufns7CwMO0xgLcRYQW8zmg0yg17b8xo5fS0R9m0hcXL89zJZ8QVcNEIK+B1xuNxRiunc/Xh+zM3vzjtcd6y1fFKlo7cnfF4LKyAi0ZYAeuam1/MrsuumPYYANuKg9cBADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCcbCquq+rtV9XhVfbWqPlVVf7iqLq+qz1XV08Pp4sT+91TViap6qqpu3brxAQBmxwXDqqquTfKzSQ601v54krkkh5J8OMkjrbV9SR4ZLqeqbhquf0+S25J8rKrmtmZ8AIDZsdG3Ancl+a6q2pXknUleTHIwyZHh+iNJbh/OH0zyUGvtTGvtZJITSW7uNjEAwIy6YFi11r6e5B8neT7JS0lGrbXfSvKu1tpLwz4vJblquMm1SV6YuItTwzYAgB1tI28FLmbtVai9Sb43yXxV/dSb3WSdbW2d+72zqo5X1fHl5eWNzgsAMLM28lbgTyQ52Vpbbq19M8lnk/zpJC9X1TVJMpy+Mux/Ksn1E7e/LmtvHb5Oa+2B1tqB1tqB3bt3b+YxAADMhI2E1fNJ3ldV76yqSnJLkieTHEtyeNjncJKHh/PHkhyqqkuram+SfUke7Ts2AMDs2XWhHVprn6+qzyT5YpKzSb6U5IEk353kaFV9IGvxdcew/+NVdTTJE8P+d7XWVrdofgCAmXHBsEqS1tq9Se59w+YzWXv1ar3970ty3+ZGAwDYXnzyOgBAJ8IKAKATYQUA0ImwAgDoRFgBAHQirAAAOhFWAACdCCsAgE6EFQBAJ8IKAKATYQUA0ImwAgDoRFgBAHQirAAAOhFWAACdCCsAgE6EFQBAJ8IKAKATYQUA0ImwAgDoRFgBAHQirAAAOhFWAACdCCsAgE6EFQBAJ7umPQDsFKPRKOPxeNpjbNrS0tK0RwDYtoQVdDAajXLD3hszWjk97VEAmCJhBR2Mx+OMVk7n6sP3Z25+cdrjbMpry89m+dP3TnsMgG1JWEFHc/OL2XXZFdMeY1NWxyvTHgFg2xJWwI62U44Zm5+fz8LCwrTHAC5AWAE70rkzryaXzGX//v3THqWLhcXL89zJZ8QVzDhhBexI7eyZ5NzqjjjubXW8kqUjd2c8HgsrmHHCCtjRdsJxb8D24QNCAQA6EVYAAJ0IKwCAToQVAEAnwgoAoBNhBQDQibACAOhEWAEAdCKsAAA6EVYAAJ0IKwCAToQVAEAnwgoAoBNhBQDQibACAOhEWAEAdCKsAAA6EVYAAJ1sKKyq6o9U1Weq6mtV9WRV/VBVXV5Vn6uqp4fTxYn976mqE1X1VFXdunXjAwDMjo2+YvUrSf5Ta+2PJvkTSZ5M8uEkj7TW9iV5ZLicqropyaEk70lyW5KPVdVc78EBAGbNBcOqqr4nyY8m+USStNZea6393yQHkxwZdjuS5Pbh/MEkD7XWzrTWTiY5keTmvmMDAMyejbxidWOS5ST/sqq+VFUfr6r5JO9qrb2UJMPpVcP+1yZ5YeL2p4ZtAAA72kbCaleS9yb51dbaDyYZZ3jb7zxqnW3t/9up6s6qOl5Vx5eXlzc0LADALNtIWJ1Kcqq19vnh8meyFlovV9U1STKcvjKx//UTt78uyYtvvNPW2gOttQOttQO7d+9+q/MDAMyMC4ZVa20pyQtV9e5h0y1JnkhyLMnhYdvhJA8P548lOVRVl1bV3iT7kjzadWoAgBm0a4P7fSjJJ6vqHUmeSfI3sxZlR6vqA0meT3JHkrTWHq+qo1mLr7NJ7mqtrXafHABgxmworFprjyU5sM5Vt5xn//uS3PfWxwIA2H588joAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANDJhsOqquaq6ktV9R+Gy5dX1eeq6unhdHFi33uq6kRVPVVVt27F4AAAs+Y7ecXq55I8OXH5w0keaa3tS/LIcDlVdVOSQ0nek+S2JB+rqrk+4wIAzK4NhVVVXZfkzyf5+MTmg0mODOePJLl9YvtDrbUzrbWTSU4kubnLtAAAM2yjr1jdn+QXkpyb2Pau1tpLSTKcXjVsvzbJCxP7nRq2AQDsaBcMq6r6C0leaa19YYP3Wetsa+vc751Vdbyqji8vL2/wrgEAZtdGXrH64STvr6pnkzyU5Mer6t8kebmqrkmS4fSVYf9TSa6fuP11SV5845221h5orR1orR3YvXv3Jh4CAMBsuGBYtdbuaa1d11rbk7WD0v9ra+2nkhxLcnjY7XCSh4fzx5IcqqpLq2pvkn1JHu0+OQDAjNm1idt+NMnRqvpAkueT3JEkrbXHq+pokieSnE1yV2ttddOTAgDMuO8orFprv53kt4fz30hyy3n2uy/JfZucDQBgW/HJ6wAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADoRVgAAnQgrAIBOhBUAQCfCCgCgE2EFANCJsAIA6ERYAQB0IqwAADrZNe0BANiYpaWlaY/Qxfz8fBYWFqY9BmwJYQUw486deTW5ZC779++f9ihdLCxenudOPiOu2JGEFcCMa2fPJOdWc/Xh+zM3vzjtcTZldbySpSN3ZzweCyt2JGEFsE3MzS9m12VXTHsM4E04eB0AoBNhBQDQibACAOhEWAEAdCKsAAA6EVYAAJ0IKwCATnyOFVM1Go0yHo+nPcam7ZSvGgFgc4QVUzMajXLD3hszWjk97VEAoAthxdSMx+OMVk7viK/peG352Sx/+t5pjwHAlAkrpm4nfE3H6nhl2iMAMAMcvA4A0ImwAgDoRFgBAHQirAAAOhFWAACdCCsAgE6EFQBAJ8IKAKATYQUA0ImwAgDoRFgBAHQirAAAOrlgWFXV9VX136rqyap6vKp+bth+eVV9rqqeHk4XJ25zT1WdqKqnqurWrXwAAACzYiOvWJ1N8vdaa38syfuS3FVVNyX5cJJHWmv7kjwyXM5w3aEk70lyW5KPVdXcVgwPADBLLhhWrbWXWmtfHM7/fpInk1yb5GCSI8NuR5LcPpw/mOSh1tqZ1trJJCeS3Nx5bgCAmfMdHWNVVXuS/GCSzyd5V2vtpWQtvpJcNex2bZIXJm52atgGALCjbTisquq7k/y7JHe31n7vzXZdZ1tb5/7urKrjVXV8eXl5o2MAAMysDYVVVf2hrEXVJ1trnx02v1xV1wzXX5PklWH7qSTXT9z8uiQvvvE+W2sPtNYOtNYO7N69+63ODwAwMzbyV4GV5BNJnmyt/fLEVceSHB7OH07y8MT2Q1V1aVXtTbIvyaP9RgYAmE27NrDPDyf56SRfqarHhm3/IMlHkxytqg8keT7JHUnSWnu8qo4meSJrf1F4V2tttffgAACz5oJh1Vr7n1n/uKkkueU8t7kvyX2bmAsAYNvxyesAAJ0IKwCAToQVAEAnwgoAoBNhBQDQibACAOhEWAEAdCKsAAA6EVYAAJ0IKwCAToQVAEAnwgoAoBNhBQDQibACAOhEWAEAdCKsAAA6EVYAAJ0IKwCAToQVAEAnwgoAoBNhBQDQya5pDwDA28/S0tK0R+hifn4+CwsL0x6DGSKsALhozp15NblkLvv375/2KF0sLF6e504+I674A8IKgIumnT2TnFvN1Yfvz9z84rTH2ZTV8UqWjtyd8XgsrPgDwgqAi25ufjG7Lrti2mNAdw5eBwDoRFgBAHQirAAAOhFWAACdCCsAgE6EFQBAJ8IKAKATYQUA0ImwAgDoRFgBAHQirAAAOhFWAACd+BLmbWg0GmU8Hk97jE1bWlqa9ggA0JWw2mZGo1Fu2HtjRiunpz0KAPAGwmqbGY/HGa2cztWH78/c/OK0x9mU15afzfKn7532GADQjbDapubmF7PrsiumPcamrI5Xpj0CAHTl4HUAgE6EFQBAJ8IKAKATYQUA0ImwAgDoRFgBAHQirAAAOvE5VgCwCTvh67nm5+ezsLAw7TF2BGEFAG/BuTOvJpfMZf/+/dMeZdMWFi/PcyefEVcdCCsAeAva2TPJudVt/xVjq+OVLB25O+PxWFh1sGVhVVW3JfmVJHNJPt5a++hW/S4AmJad8BVj9LMlYVVVc0n+eZI/m+RUkt+pqmOttSe24vdt1Gg0yng8nuYIm7YT3ssHgJ1qq16xujnJidbaM0lSVQ8lOZhkamE1Go1yw94bM1o5Pa0RAGBm7ZR/uE/7QPytCqtrk7wwcflUkj+1Rb9rQ8bjcUYrp3PVX/lILnnn9n0P+ZvfeCHf+I+/nNXxyrRH2bTVV0drpx7LTNkpj2WnPI7EY5lVO+WxfHPlxR1zEH4y/QPxq7XW/06r7khya2vtbw2XfzrJza21D03sc2eSO4eL707yVPdBZs+VSX532kPwOtZk9liT2WRdZo81mZ4bWmu717tiq16xOpXk+onL1yV5cXKH1toDSR7Yot8/k6rqeGvtwLTn4NusyeyxJrPJusweazKbtuqT138nyb6q2ltV70hyKMmxLfpdAAAzYUtesWqtna2qDyb5z1n7uIUHW2uPb8XvAgCYFVv2OVattd9I8htbdf/b1Nvqrc9twprMHmsym6zL7LEmM2hLDl4HAHg72qpjrAAA3naE1SZU1YNV9UpVfXVi2z+sqv9dVY9V1W9V1fdOXHdPVZ2oqqeq6taJ7fur6ivDdf+squpiP5adYr01mbju56uqVdWVE9usyUVwnufKR6rq68Nz5bGq+smJ66zLFjvfc6WqPjT8d3+8qv7RxHZrssXO8zz59YnnyLNV9djEddZkFrXW/LzFnyQ/muS9Sb46se17Js7/bJJ/MZy/KcmXk1yaZG+S/5Nkbrju0SQ/lKSS/GaSPzftx7Zdf9Zbk2H79Vn7Y4rnklxpTaa/Lkk+kuTn19nXukxvTf5Mkv+S5NLh8lXWZLpr8obr/0mSX7Ims/3jFatNaK39jySn37Dt9yYuzif51kFsB5M81Fo701o7meREkpur6pqsxdj/amvPiH+d5PYtH36HWm9NBv80yS/k2+uRWJOL5k3WZT3W5SI4z5r8TJKPttbODPu8Mmy3JhfBmz1Phled/mqSTw2brMmMElZboKruq6oXkvz1JL80bF7va36uHX5OrbOdTqrq/Um+3lr78huusibT98HhrfMHq2px2GZdpuf7k/xIVX2+qv57Vf3JYbs1mb4fSfJya+3p4bI1mVHCagu01n6xtXZ9kk8m+eCweb33uNubbKeDqnpnkl/MtwP3dVevs82aXDy/muT7kvxAkpey9jZHYl2maVeSxSTvS/L3kxwdXimxJtP31/LtV6sSazKzhNXW+rdJ/vJw/nxf83NqOP/G7fTxfVk7/uDLVfVs1v77frGqro41marW2suttdXW2rkkv5bk5uEq6zI9p5J8tq15NMm5rH0fnTWZoqraleQvJfn1ic3WZEYJq86qat/Exfcn+dpw/liSQ1V1aVXtTbIvyaOttZeS/H5VvW/4l+HfSPLwRR16B2utfaW1dlVrbU9rbU/W/qfz3tbaUqzJVA3HgnzLX0zyrb+Esi7T8++T/HiSVNX3J3lH1r7k15pM108k+VprbfItPmsyo7bsk9ffDqrqU0l+LMmVVXUqyb1JfrKq3p21f+k9l+TvJElr7fGqOprkiSRnk9zVWlsd7upnkvyrJN+Vtb/g+M2L+DB2lPXWpLX2ifX2tSYXz3meKz9WVT+Qtbcpnk3ytxPrcrGcZ00eTPLg8Of+ryU5PBwAbU0ugjf5/9ehvP5tQM+TGeaT1wEAOvFWIABAJ8IKAKATYQUA0ImwAgDoRFgBAHQirAAAOhFWAACdCCsAgE7+HxZxUzI1j8mfAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 720x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sample_means50 = [population.sample(60).mean() for i in range(0, 3000)]\n",
    "\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "plt.rcParams['figure.figsize'] = (10,5)\n",
    "plt.hist(sample_means50, edgecolor = 'black', linewidth = 1.2)\n",
    "plt.show();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In this case we have the luxury of knowing the true population mean since we have data on the entire population."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Does your confidence interval capture the true average size of houses in Ames?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now we're going to recreate many samples to learn more about how sample means and confidence intervals vary from one sample to another."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "samp_mean = np.empty(50)\n",
    "samp_sd = np.empty(50)\n",
    "n = 60"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(50):\n",
    "    samp = population.sample(n) # obtain a sample of size n = 60 from the population\n",
    "    samp_mean[i] = samp.mean() # save sample mean in ith element of samp_mean\n",
    "    samp_sd[i] = np.std(samp) # save sample sd in ith element of samp_sd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "se_array = samp_sd/np.sqrt(n)\n",
    "lower_array = samp_mean - (1.96 * se_array)\n",
    "upper_array = samp_mean + (1.96 * se_array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1: What proportion of your confidence intervals include the true population mean? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2: Pick a confidence level of your choosing (again for size=60), provided it is not 95% and calculate 50 confidence intervals at the confidence level you chose. What proportion of your confinence intervals include the true population mean?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3: Repeat exercise 2 for the same size of confidence interval but for sample size =100. What do you notice?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
