Of Bigger Blocks & Paradox

Show code
import pandas as pd
import numpy as np
import os
import sys
sys.path.append("../python")
import general
import visualizations
import simpson
Show code
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
Show code
params = {'OUTPUT' : {'path' : os.path.join('output_html', 'simpsons_paradox'),
                      'name' : 'simpsons_paradox_20200913'}}

While weaving some tales of regional sales,
We observe a deceptive emergence…
Is closing a lead simply meeting their need
And if so, what explains the divergence?

So typically, A, spends their time every day,
Converting the north and the east.
While B keeps abreast of the south and the west
Where some palms likely need to be greased…

Person A

Show code
display(simpson.SALESPERSON_A)
region leads met_their_needs
0 North 2000 1800
1 East 200 140
2 South 20 12
3 West 2 1

Person B

Show code
display(simpson.SALESPERSON_B)
region leads met_their_needs
0 North 2 2
1 East 20 16
2 South 200 140
3 West 2000 1200
Show code
conversion_A = np.sum(simpson.SALESPERSON_A.met_their_needs)/np.sum(simpson.SALESPERSON_A.leads)
conversion_B = np.sum(simpson.SALESPERSON_B.met_their_needs)/np.sum(simpson.SALESPERSON_B.leads)

We can skip the debate of the overall rate,
As we see A is clearly exceeding.
And yet, we would bet that B must be upset,
As the figure is truly misleading.

Show code
visualizations.simpsonOverallConversion(conversion_A, conversion_B)

Though we’d hate to insult A’s impressive result
The assignments are clearly unequal.
Though we cheer A’s premier, the disparity’s clear,
And we’d still see the same in the sequel

Show code
sales_df = simpson.getSalesDF()

For when we start to tease apart,
Their sales across the nation,
We must attest that B is best
And warrants adulation!

Show code
visualizations.simpsonByRegion(sales_df)

The fact remains that A retains,
The regions where succeeding.
Requires less, so reassess
Just who we ‘knew’ was leading…

When fixing holes and setting goals
Please set a proper baseline.
Which must be checked, and circumspect
If some domains are goldmines…

We call this “Simpson’s paradox”
When samples are uneven.
And only the unorthodox
Find figures to believe in!

Show code
if not os.path.exists(params['OUTPUT']['path']): os.makedirs(params['OUTPUT']['path'])
general.publish('simpsonsParadox', params['OUTPUT']['path'], params['OUTPUT']['name'])