pandas - Python: Implement mean of means 95% Confidence Interval? -


how can this solution implemented using pandas/python? question concerns implementation of finding 95% ci around mean of means using stats.stackexchange solution.

import pandas pd ipython.display import display import scipy import scipy.stats st  import scikits.bootstrap bootstraps  data = pd.dataframe({      "exp1":[34, 41, 39]      ,"exp2":[45, 51, 52]     ,"exp3":[29, 31, 35] }).t  data.loc[:,"row_mean"] = data.mean(axis=1) data.loc[:,"row_std"] = data.std(axis=1) display(data) 

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>0</th>      <th>1</th>      <th>2</th>      <th>row_mean</th>      <th>row_std</th>    </tr>  </thead>  <tbody>    <tr>      <th>exp1</th>      <td>34</td>      <td>41</td>      <td>39</td>      <td>38.000000</td>      <td>2.943920</td>    </tr>    <tr>      <th>exp2</th>      <td>45</td>      <td>51</td>      <td>52</td>      <td>49.333333</td>      <td>3.091206</td>    </tr>    <tr>      <th>exp3</th>      <td>29</td>      <td>31</td>      <td>35</td>      <td>31.666667</td>      <td>2.494438</td>    </tr>  </tbody> </table>

mean_of_means = data.row_mean.mean() std_of_means = data.row_mean.std() confidence = 0.95 print("mean(means): {}\nstd(means):{}".format(mean_of_means,std_of_means)) 
  • mean(means): 39.66666666666667
  • std(means): 8.950481054731702

1st incorrect attempt (zscore):

zscore = st.norm.ppf(1-(1-confidence)/2) lower_bound = mean_of_means - (zscore*std_of_means) upper_bound = mean_of_means + (zscore*std_of_means) print("95% ci = [{},{}]".format(lower_bound,upper_bound)) 
  • 95% ci = [22.1,57.2] (incorrect solution)

2nd incorrect attempt (tscore):

tscore = st.t.ppf(1-0.05, data.shape[0]) lower_bound = mean_of_means - (tscore*std_of_means) upper_bound = mean_of_means + (tscore*std_of_means) print("95% ci = [{},{}]".format(lower_bound,upper_bound)) 
  • 95% ci = [18.60,60.73] (incorrect solution)

3rd incorrect attempt (boostrap):

cis = bootstraps.ci(data=data.row_mean, statfunction=scipy.mean,alpha=0.05) 
  • 95% ci = [31.67, 49.33] (incorrect solution)

how can this solution implemented using pandas/python correct solution below?

  • 95% ci = [17.4 61.9] (correct solution)

thank jon bates.

import pandas pd import scipy import scipy.stats st   data = pd.dataframe({      "exp1":[34, 41, 39]      ,"exp2":[45, 51, 52]     ,"exp3":[29, 31, 35] }).t  data.loc[:,"row_mean"] = data.mean(axis=1) data.loc[:,"row_std"] = data.std(axis=1)  tscore = st.t.ppf(1-0.025, data.shape[0]-1)  print("mean(means): {}\nstd(means): {}\ntscore: {}".format(mean_of_means,std_of_means,tscore))  lower_bound = mean_of_means - (tscore*std_of_means/(data.shape[0]**0.5)) upper_bound = mean_of_means + (tscore*std_of_means/(data.shape[0]**0.5))  print("95% ci = [{},{}]".format(lower_bound,upper_bound)) 

mean(means): 39.66666666666667
std(means): 8.950481054731702
tscore: 4.302652729911275
95% ci = [17.432439139464606,61.90089419386874]


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -