Of Coward Flaws & Power Laws

Show code

import pandas as pd
import numpy as np
from tqdm import tqdm
import os
import sys
sys.path.append("../python")
import general
import visualizations

Show code

from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

Show code

params = {'OUTPUT' : {'path' : os.path.join('output_html', 'power_law'),
                      'name' : 'power_law_20200917'},
          
          # Simulating heights of American Women
          'GAUSSIAN': {'mean' : 65, 
                       'stdev' : 3.5,
                       'n' : 10000},
          
          # Simulating batting averages (baseball)
          'BINOMIAL' : {'at_bats_per_game' : 4,
                        'batting_average' : 0.3,
                        'number_of_games' : 100000},
          
          # Value of founded company
          'POWERLAW' : {'shape' : 0.8,
                        'n' : 50000000,
                        'iteration_size' : 100000} # Recalculate mean every, say 100,000th sample
         }

Statistics, in a nutshell, is a tool for comprehending
The known and the uncertain, via samples never-ending
Ideally, with sufficient size, we soon achieve convergence,
From which, debates are settled via wisdom’s swift emergence

Show code

random_sample = np.random.normal(params['GAUSSIAN']['mean'], params['GAUSSIAN']['stdev'], params['GAUSSIAN']['n'])

Show code

visualizations.gaussianHistogram(random_sample)

Show code

gaussian_means = []
for i in tqdm(range(1,params['GAUSSIAN']['n']+1)):
    gaussian_means.append(np.mean(random_sample[:i]))

100%|██████████| 10000/10000 [00:00<00:00, 35367.48it/s]

Show code

visualizations.gaussianConvergenceLine(gaussian_means, params['GAUSSIAN']['mean'])

Canonical examples, which a lecturer deploys
Discuss the dull dimensions of two samples - girls and boys
And thus cliches perpetuate and students doze from boredom
“Deliver better content!” thus the teacher’s class implored him…

Show code

random_sample = np.random.binomial(params['BINOMIAL']['at_bats_per_game'], params['BINOMIAL']['batting_average'], 
                                   params['BINOMIAL']['number_of_games'])

Show code

visualizations.binomialHistogram(random_sample)

Show code

binomial_means = []
for i in tqdm(range(1,params['BINOMIAL']['number_of_games']+1)):
    binomial_means.append(np.mean(random_sample[:i])/(params['BINOMIAL']['at_bats_per_game']))

100%|██████████| 100000/100000 [00:11<00:00, 9048.29it/s]

And though, for some, athletic stats are somewhat more compelling,
The fallacies of gamesmen oversimplify foretelling.
Because, alas, such cases of predictable behavior,
Mislead the intuition that large samples serve as savior.

Show code

visualizations.binomialConvergenceLine(binomial_means, params['BINOMIAL']['batting_average'])

Show code

random_sample = np.random.pareto(params['POWERLAW']['shape'], params['POWERLAW']['n'])

Show code

visualizations.powerLawHistogram(random_sample)

Show code

power_law_means, power_law_medians, n_samples = [],[],[]
for i in tqdm(range(1,int(params['POWERLAW']['n']/params['POWERLAW']['iteration_size'])+1)):
    power_law_means.append(np.mean(random_sample[:(i*params['POWERLAW']['iteration_size'])]))
    power_law_medians.append(np.percentile(random_sample[:(i*params['POWERLAW']['iteration_size'])],50))
    n_samples.append(i*params['POWERLAW']['iteration_size'])

100%|██████████| 500/500 [03:44<00:00,  2.23it/s]

For losses may be well-defined,
and gains may be obscene.
A model may hold miracles,
and still may lack a mean!

Show code

visualizations.powerLawMedianLine(power_law_medians, n_samples)

The median may still exist, remarkably consistent,
But mean and even variance may still be non-existent!
For though most efforts end in loss as torturous examples,
We find expected values truly rise with larger samples!
And even after sample sizes climb into the millions
Results defy the certainty so prized by mere civilians

Show code

visualizations.powerLawConvergenceLine(power_law_means, n_samples)

So which world is inhabited when forced to make decisions?
Convergence and simplicity or magnitude revisions?
For wisdom is available if first one ascertains
If losses fit a power law…or if the tail yields gains

Show code

if not os.path.exists(params['OUTPUT']['path']): os.makedirs(params['OUTPUT']['path'])
general.publish('powerLaw', params['OUTPUT']['path'], params['OUTPUT']['name'])

Statistics, in a nutshell, is a tool for comprehending The known and the uncertain, via samples never-ending Ideally, with sufficient size, we soon achieve convergence, From which, debates are settled via wisdom’s swift emergence

Canonical examples, which a lecturer deploys Discuss the dull dimensions of two samples - girls and boys And thus cliches perpetuate and students doze from boredom “Deliver better content!” thus the teacher’s class implored him…

And though, for some, athletic stats are somewhat more compelling, The fallacies of gamesmen oversimplify foretelling.Because, alas, such cases of predictable behavior, Mislead the intuition that large samples serve as savior.

For losses may be well-defined,and gains may be obscene.A model may hold miracles,and still may lack a mean!

So which world is inhabited when forced to make decisions?Convergence and simplicity or magnitude revisions?For wisdom is available if first one ascertainsIf losses fit a power law…or if the tail yields gains

Statistics, in a nutshell, is a tool for comprehending
The known and the uncertain, via samples never-ending
Ideally, with sufficient size, we soon achieve convergence,
From which, debates are settled via wisdom’s swift emergence

Canonical examples, which a lecturer deploys
Discuss the dull dimensions of two samples - girls and boys
And thus cliches perpetuate and students doze from boredom
“Deliver better content!” thus the teacher’s class implored him…

And though, for some, athletic stats are somewhat more compelling,
The fallacies of gamesmen oversimplify foretelling.
Because, alas, such cases of predictable behavior,
Mislead the intuition that large samples serve as savior.

For losses may be well-defined,
and gains may be obscene.
A model may hold miracles,
and still may lack a mean!

So which world is inhabited when forced to make decisions?
Convergence and simplicity or magnitude revisions?
For wisdom is available if first one ascertains
If losses fit a power law…or if the tail yields gains