probly.dev

Examples

How it works

Probly is a Python-like mini-language for probabilistic estimation. It's based on Starlark and implemented in Go.

Probly syntax

You may use any Starlark syntax. There are only the following differences to Starlark:

A variable may follow a probability distribution, in addition to usual types like numbers or dictionaries
The Starlark math module is imported by default, so you can directly use it e.g. math.sqrt(2). Probly also has a built-in sum function not available in Starlark.

This page will show you the probability distributions of all the numeric (scalar or distribution) global variables in your program (except those starting with an underscore). The values are taken at the end of the program's execution.

Probability distributions

Name						Quantiles	Notes
`Normal`	`mean`	`sd`				2
`Normal` examples Parameters `Normal(1, 2) Normal(mean=1, sd=2)` Quantiles `Normal(p12=-1, p34=0) Normal(quantiles={0.123:-1, 0.456:0})` `to` operator (10th to 90th percentile) `Normal(-1 to 0)` `pm` operator `Normal(-1 pm 1)` `m pm x` makes `m` the median and `m-x` and `m+x` the 10th and 90th percentiles respectively. This works because the Normal distribution is symmetric around its median.
`LogNormal`	`mu`	`sigma`				2	Alternatively: `mean`, `sd`
`LogNormal` examples Parameters Parametrization 1 `LogNormal(1, 2) LogNormal(mu=1, sigma=2)` Parametrization 2 `LogNormal(mean=3, sd=4)` Quantiles `LogNormal(p12=1, p34=2) LogNormal(quantiles={0.123:1, 0.456:2})` `to` operator (10th to 90th percentile) `LogNormal(1 to 2)` `td` operator `LogNormal(1 td 2)` `m td x` makes `m` the median and `m/x` and `m*x` the 10th and 90th percentiles respectively. This works because the LogNormal distribution is symmetric in multiplicative space around its median.
`Beta`	`alpha`	`beta`
`Beta` examples `Beta(1, 2) Beta(alpha=1, beta=2)`
`PERT`	`min`	`mode`	`max`	`[lambd]`			Like the triangular, but smoother (Wikipedia)
`PERT` The optional parameter `lambd` (default: 4) controls the weight given to the mode. Values of `lambd < 4` have the effect of flattening the density curve. Examples Parametrization 1 `PERT(0, 1, 3) PERT(min=0, mode=1, max=3)` Parametrization 2 `PERT(min=0, mode=1, max=3, lambd=4)`
`Uniform`	`a`	`b`				2	`a` need not be less than `b`
`Uniform` examples Parameters `Uniform(1, 2) Uniform(a=1, b=2)` Quantiles `Uniform(p12=-1, p34=0) Uniform(quantiles={0.123:-1, 0.456:0})` `to` operator (10th to 90th percentile) `Uniform(-1 to 0)` `pm` operator `Uniform(-1 pm 1)` `m pm x` makes `m` the median and `m-x` and `m+x` the 10th and 90th percentiles respectively. This works because the Uniform distribution is symmetric around its median.
`LogUniform`	`a`	`b`				2	`a` need not be less than `b`
`LogUniform` examples Parameters `LogUniform(1, 2) LogUniform(a=1, b=2)` Quantiles `LogUniform(p12=1, p34=2) LogUniform(quantiles={0.123:1, 0.456:2})` `to` operator (10th to 90th percentile) `LogUniform(1 to 2)` `td` operator `LogUniform(1 td 2)` `m td x` makes `m` the median and `m/x` and `m*x` the 10th and 90th percentiles respectively. This works because the LogUniform distribution is symmetric in multiplicative space around its median.
`Bernoulli`	`p`
`Bernoulli` examples `Bernoulli(0.5) Bernoulli(p=0.5)`
`Binomial`	`n`	`p`
`Binomial` examples `Binomial(1, 0.5) Binomial(n=1, p=0.5)`
`Discrete`	`x_1`	`p_1`	`x_2`	`p_2`	...		Generic discrete distribution over any finite set of values
`Discrete` examples `Discrete(1, 0.4, 2, 0.6) Discrete(x_1=1, p_1=0.4, x_2=2, p_2=0.6)`

math

These mathematical functions and constants are available in the math module:

pow(x, y) - Returns x raised to the power of y
exp(x)
sqrt(x)
log(x, [base]) - Natural logarithm by default if base is not specified
e
pi
Ceil, floor, and sign manipulation:
- ceil(x)
- floor(x)
- fabs(x) - Returns the absolute value of x as float
- copysign(x, y) - Returns a value with the magnitude of x and the sign of y
mod(x, y) - Returns x modulo y
remainder(x, y)
round(x) - Returns the nearest integer, rounding half away from zero
Trigonometry (in radians unless otherwise specified):
- acos(x)
- asin(x)
- atan(x)
- atan2(y, x) - Returns atan(y / x). The result is between -pi and pi
- cos(x)
- sin(x)
- tan(x)
- degrees(x) - Converts angle x from radians to degrees
- radians(x) - Converts angle x from degrees to radians
- acosh(x)
- asinh(x)
- atanh(x)
- cosh(x)
- sinh(x)
- tanh(x)
hypot(x, y) - Returns the Euclidean norm, sqrt(x^2 + y^2); the distance from the origin to (x, y)
gamma(x) - Returns the Gamma function at x

Starlark syntax

This code provides an example of the syntax of Starlark:

# Define a number
number = 18

# Define a list
numbers = [1, 2, 3, 4, 5]

# List comprehension
halves = [n / 2 for n in numbers]

# Define a function
def is_even(n):
    """Return True if n is even."""
    return n % 2 == 0

# Define a dictionary
people = {
    "Alice": 22,
    "Bob": 40,
    "Charlie": 55,
    "Dave": 14,
}

names = ", ".join(people.keys())  # Alice, Bob, Charlie, Dave

# Modify a variable in a loop
sum_even_ages = 0
for age in people.values():
    if is_even(age):
        sum_even_ages += age

# Append to a list in a loop
over_30_names = []
for name, age in people.items():
    if age > 30:
        over_30_names.append(name)

If you've ever used Python, this should look very familiar. In fact, the code above is also valid Python code. Still, this short example shows most of the language. Starlark is a very small language that implements a limited subset of Python.

For our purposes, one notable difference to Python is that the exponentiation operator ** is not supported. You have to use math.pow.

You can also look at the Starlark language specification.

Speed

Though not designed for speed, Probly is fast enough for practical purposes: around 10 milliseconds for 3,000 samples, for most examples on this page. This is due to being implemented in Go.

The time taken to return results on this page is spent overwhelmingly in web application code, not in Probly evaluation.

Interestingly, Probly is still slower than Python code that uses entirely numpy array operations, which are very well optimised. This should only begin to matter at very large scales, or if latency is critical.

Limitations

It's not currently possible to obtain and manipulate properties of a distribution within an Probly program, like so:

x = Normal(1 to 10)
y = x.std()  # Not possible

Supporting this would require some fundamental changes to the implementation of Probly, which is currently very simplistic.

Prior work

The to binary operator was inspired by Squiggle.

Content

_cohort = 10 * 1000  # Arbitrary

# Iron-deficiency anemia prevalence, children in India (any severity)
# Global Burden of Disease 2016, see 'Supplementary data'
p_anemia = Beta(19, 22)  # p15=0.38, p85=0.54; beta by quantiles not yet supported

def anemia_reduction():
    """
    In YLDs per cohort
    """
    # Of children with anemia, proportion with mild, moderate, and severe anemia
    p_severity = {
        "mild": 48 / 100,
        "moderate": 48 / 100,
        "severe": 4.1 / 100
    }

# Risk ratio anemia, iron supplementation
    risk_ratio = PERT(0.3, 0.5, 1)

p_cured = {}
    for severity in p_severity:
        p_cured[severity] = p_anemia * p_severity[severity] * risk_ratio

yld = {
        "mild": 0.4 / 100,
        "moderate": 5.2 / 100,
        "severe": 15 / 100
    }

yld_sum = 0
    for severity in yld:
        yld_sum += p_cured[severity] * yld[severity]

return _cohort * yld_sum

def cognitive_benefits():
    """
    In (present value of) units of increase in ln(income) per cohort.

These drive most of the cost-effectiveness.
    """
    # The intervention targets all children in a school, but we model
    # cognitive benefits as accruing only to children between ages 3-12.
    # Proportion of targeted children between ages 3-12
    p_children = 65 / 100

# Change in IQ points from short-term supplementation for anemic individuals
    iq_change_sd = LogNormal(0.33 td 2)
    iq_change = iq_change_sd * 15  # Convert to IQ points: 1 SD = 15 IQ points

p_wages_per_point = 1 / 100  # Percentage increase in wages/consumption for every 1 point increase in IQ

# Percentage increase in wages/consumption from increase in IQ from supplementation
    p_wages = iq_change * p_wages_per_point

# Speculative adjustment to assess the long-term benefit of many years of iron
    long_term_adjustment = 10 / 100
    p_wages = p_wages * long_term_adjustment

# Increase in annual ln(consumption) for beneficiaries
    change_ln_cons = math.log(1 + p_wages) - math.log(1)

delay_y = 12  # Average number of years between 'entering fortification program' and beginning of long term benefits
    duration_y = LogUniform(15 td 3)  # Duration of long term benefits of fortification (in years)

# Present Value Income Benefits
    discount_rate = Uniform(2/100, 7/100)
    pv_income = pv(discount_rate, delay_y + duration_y, -change_ln_cons) - pv(
        discount_rate, delay_y, -change_ln_cons
    )  # Unit is still changes in ln(income)

# Multiplier for Resource Sharing in Households
    multiplier_resource_sharing = 2
    pv_income = pv_income * multiplier_resource_sharing

# Lifetime Income Benefits for Full Cohort (including non-beneficiaries)
    return _cohort * p_children * p_anemia * pv_income

def altered_malaria_risk():
    """
    In deaths per cohort
    """
    baseline_deaths_100k = 5.14  # Baseline deaths due to malaria per 100,000 individuals 0-19 in India
    # Convert to per cohort
    baseline_deaths = baseline_deaths_100k * (_cohort / (100 * 1000))

rr = PERT(1, 1.16, 1.5)  # Relative risk of malaria mortality with IFA supplementation

direct = baseline_deaths * (rr - 1)
    indirect_per_direct = 50 / 100
    return direct * (1 + indirect_per_direct)

def other_adverse_effects():
    """
    In YLDs per cohort
    """

p_condition = {
        "gastro": (8 + 1 + 33 + 2 + 8) / (23 + 12 + 128 + 2 + 8),
        "loose_stools": (1 + 2 + 6 + 2 + 1 + 3) / (36 + 8 + 128 + 6 + 44 + 71),
        "hard_stools": (1 + 5 + 0 + 0 + 11 + 0 + 6 + 3) / (31 + 36 + 8 + 182 + 128 + 6 + 44 + 71),
        "abdominal_pain": (0 + 4 + 2 + 15 + 1 + 3 + 4) / (120 + 36 + 182 + 128 + 6 + 44 + 71),
    }

yld = {
        "gastro": 7.4 / 100,
        "loose_stools": 7.4 / 100,
        "hard_stools": 7.4 / 100,
        "abdominal_pain": 1.1 / 100,
    }

duration = 2 / 365

yld_sum = 0
    for condition in yld:
        yld_sum += p_condition[condition] * yld[condition] * duration

return _cohort * yld_sum

def yld_to_val(yld):
    """
    YLDs to units of value. Arbitrary normalization.
    """
    return 1 * yld

def ln_cons_to_val(ln_cons):
    """
    Units of increase in ln(consumption) to units of value
    """
    value_double = Uniform(0.2, 0.8)  # Rule of thumb that 1 DALY = 2.5x GDP per capita would give 0.4
    value_1_ln = value_double / math.log(2)
    return value_1_ln * ln_cons

def death_to_val(death):
    """
    Deaths to units of value
    """
    return 30 * death

def units_value():
    """
    In "units of value" per cohort (normalized to 1 unit of value = 1 YLD).
    """

anemia_ylds_averted = anemia_reduction()
    units_increase_ln_income = cognitive_benefits()
    malaria_deaths = altered_malaria_risk()
    side_effect_ylds = other_adverse_effects()

u_value = {
        "anemia": yld_to_val(anemia_ylds_averted),
        "income": ln_cons_to_val(units_increase_ln_income),
        "malaria": -death_to_val(malaria_deaths),
        "side_effect": -yld_to_val(side_effect_ylds),
    }

return sum(u_value.values())

def cost_effectiveness():
    """
    Units of value per $10,000
    """
    # Supplementation cost per person treated
    cost_p_p = 2
    cost_cohort = _cohort * cost_p_p
    val_cohort = units_value()
    return val_cohort * (10 * 1000 / cost_cohort)

def ratio_givedirectly():
    """
    Cost-effectiveness ratio vs GiveDirectly
    """
    # GiveDirectly units of increase in ln(consumption) for each $10,000 donated
    gd_ln_cons = 25
    return cost_effectiveness() / ln_cons_to_val(gd_ln_cons)

def pv(rate, nper, pmt):
    temp = math.pow((1 + rate), nper)
    fact = (temp - 1) / rate
    return -(pmt * fact) / temp

value_per_10_000_usd = cost_effectiveness()
multiples_of_cash = ratio_givedirectly()

Example GiveWell iron and folic acid CEA

This example reproduces a 2018 GiveWell cost-effectiveness analysis for iron and folic acid (IFA) supplementation of school-age children.

The model considers the direct benefits of reducing anemia, as well as benefits via cognitive improvements and their effects on long-term wages. These benefits are weighed against potential risks like increased malaria mortality and side effects.

The final output is the humanitarian value per $10,000 spent, and a comparison of this value to the impact of donating the same amount to GiveDirectly.

In this model, the cost-effectiveness is largely driven by the long-term cognitive benefits of IFA supplementation.

Additional background information

GiveWell is a global health funder. They conduct in-depth research to estimate the cost-effectiveness of a given program — in terms of humanitarian benefit (e.g. lives saved) per dollar.

GiveWell developed this cost-effectiveness analysis in the context of this grant investigation. GiveWell also has a report about iron supplementation generally.

Distribution details

multiples_of_cash


Mean	7.96
Std. dev.	6.87
Variance	47.2

Quantile
0.05	2.19
0.25	3.76
0.50	5.87
0.75	9.57
0.95	21.0

Log Scale

p (X <) =

value_per_10_000_usd


Mean	129
Std. dev.	100
Variance	10 100

Quantile
0.05	44.8
0.25	70.4
0.50	100
0.75	154
0.95	305

Log Scale

p (X <) =

p_anemia


Mean	0.466
Std. dev.	0.076 2
Variance	0.005 80

Quantile
0.05	0.342
0.25	0.414
0.50	0.464
0.75	0.518
0.95	0.591

Log Scale

p (X <) =

Simulation data

CSV

Download CSV

Preview

	_cohort	p_anemia	value_per_10_000_usd	multiples_of_cash
0	10 000	0.404	85.0	9.32
1	10 000	0.563	275	4.08
2	10 000	0.591	130	7.69
...	...	...	...	...
2997	10 000	0.300	49.4	4.56
2998	10 000	0.473	143	6.25
2999	10 000	0.470	120	10.6

API

Get the simulation data (and more) in a machine-readable format: /api/sim/XDsKzzKaFEQuxjzQGqTpvr/

Tip: you can share your results with anyone using the current url: http://probly.dev/sim/XDsKzzKaFEQuxjzQGqTpvr/?is_example=ifa. (No guarantees. If your results are important, please save them another way too).

Probly syntax

Probability distributions

Normal examples

Parameters

Quantiles

to operator (10th to 90th percentile)

pm operator

LogNormal examples

Parameters

Parametrization 1

Parametrization 2

Quantiles

to operator (10th to 90th percentile)

td operator

Beta examples

PERT

Examples

Parametrization 1

Parametrization 2

Uniform examples

Parameters

Quantiles

to operator (10th to 90th percentile)

pm operator

LogUniform examples

Parameters

Quantiles

to operator (10th to 90th percentile)

td operator

Bernoulli examples

Binomial examples

Discrete examples

math

Starlark syntax

Speed

Limitations

Prior work

Distribution details

multiples_of_cash

value_per_10_000_usd

p_anemia

Simulation data

CSV

Preview

API

`Normal` examples

`to` operator (10th to 90th percentile)

`pm` operator

`LogNormal` examples

`to` operator (10th to 90th percentile)

`td` operator

`Beta` examples

`PERT`

`Uniform` examples

`to` operator (10th to 90th percentile)

`pm` operator

`LogUniform` examples

`to` operator (10th to 90th percentile)

`td` operator

`Bernoulli` examples

`Binomial` examples

`Discrete` examples