2024-01-22 Weekly Links

Tagsin-progress
File Name2024-01-22-Weekly-Links.html

We spend our entire lives trying to tell stories about ourselves—they’re the essence of memory. It is how we make living in this unfeeling, accidental universe tolerable. That we call such a tendency “the narrative fallacy” doesn’t mean it doesn’t also touch upon some aspect of the truth. The Paper Menagerie by Ken Liu

1. Math And Herd Immunity

Measles Comeback: What To Know About The Highly Contagious And Deadly Virus As Vaccination Drops - Link - DB

Aside from playing Plague, Inc. on my phone, I don’t truck much with epidemics. But a recent headline caught my attention. Measles - yes, measles - is making a comeback due to declining vaccination rates. A disease once eradicated with a series of two childhood vaccines that saved 56 million lives from 2000 to 2021 is on the rise.

Setting aside the looming psychological question behind why people choose not to get vaccinated, the mathematics of herd immunity is both interesting and also a little terrifying.

Herd immunity is when enough people in the population are immune from the disease (from either catching it and developing natural immunity or getting the vaccine) that the disease dies out.

The goal of herd immunity is transforming it to a disease that grows linearly rather than exponentially.

A disease’s “basic reproduction number” (R0 or R_naught) reflects how many people each infected person is expected to infect assuming no one has been vaccinated against the disease.

For example, an R0 = 2 means that an infected person will infect an average of 2 additional people that they come into contact with.

Theoretically, after 30 rounds of infections, someone infected with a disease with a R0 = 2 will have infected over 2 billion people, directly or indirectly. Yikes.

Linear Rate vs Exponential Rate from Safety In Numbers Worksheet - Link - DB.

If enough people are vaccinated against a disease, its R0 can be effectively reduced to 1, ensuring that it will only spread at a linear rate. In other words, “herd immunity” is not immunity from individuals becoming infected, but immunity from the disease spreading through the population at an exponential rate. See How Math (and Vaccines) Keep You Safe From the Flu | Quanta Magazine - Link - DB.

The percentage of vaccinated individuals required in a given population to achieve herd immunity is referred to as the “herd immunity threshold” (HIT).

The HIT can be found by the following formula:

(11R0)=VN(1 - \frac{1}{R_0}) = \frac{V}{N}

Since HIT is when V/N is equal to or close to 1, the equation to find the HIT knowing the diseases R0 can be found when:

(11R0)=1(1 - \frac{1}{R_0}) = 1

R0 values are well known for most diseases (not so for newer ones such as COVID-19). The percentage of the vaccinated population (HIT) that needs to be vaccinated to obtain heard immunity for various diseases is shown in the table below.

DiseaseR01 - 1/R0HIT
Measles121 - 1/1291.7%
Smallpox51 - 1/580%
Mumps41 - 1/475%
Ebola2.51 - 1/2.560%
Influenza21 - 1/250%

The combination of a relatively high HIT combined with record low vaccination rates for measles can partially explain the re-emergence of the disease. From the article: “[b]ecause measles is so contagious — it is one of the most, if not the most, contagious infectious diseases in the world — vaccination rates need to be as high as 95% to protect communities from outbreaks, a target that the U.S., which eliminated measles in 2000, has not met in several years.”

A quick dip into R shows how exponential growth and higher R0 values affect the number infected:

# exponential infection growth function
# R_n = R0 or R_naught value
# rounds = number of times each person comes into contact with an individual
fn.infection <- function(R_n, rounds = 10){  
    a <- (1 + (R_n - 1)) ^ c(1:rounds)  
    tibble(    
        round = 1:rounds,    
        R_naught = rep(R_n, rounds),    
        no_infected = a  )
}

# data.frame based on various R_naught values from c(1:20) rounds
df.infected <- 
    map(.x = c(2, 2.5, 4, 5, 12), ~fn.infection(.x, rounds = 20)) %>% 
    bind_rows()

# plot on log scale
ggplot(df.infected, aes(x = round, y = no_infected, colour = R_naught, group = R_naught)) +   
    geom_line() +   
    theme_minimal() +  
    ggtitle("Infected Population Growth After 20 Rounds (Log Scale)") +  
    scale_y_continuous(trans='log10') +  
    scale_x_continuous(trans='log10')

This subject of herd immunity is much more complex than outlined above with a range of other factors that affect the calculation. However, the math behind HIT is clear - the more people vaccinated, the closer to the HIT you get regardless of how communicable the disease is.

Additional References

2. Paper Bad, Digital Good

Paper to Paperless: A Guide to Digitalizing Your Journals with a Scanner App | Mark Koester - Link - DB

Excellent guide on going from paper to digital.

Additional References

3. Lighter Than Air, Heavier Than Money

The U.S. just sold its helium stockpile. Here’s why the medical world is worried. - Link - DB

Helium is critical for many processes (like MRIs), but is relatively rare on earth, and difficult to store and transport. Cultivation is a byproduct of natural gas production from materials in deep earth radioactive decay. The US Government had a stockpile of the noble gas but passed a bill in 2013 to sell it to private companies.

Additional References

4. Spirited Images

Capturing art left behind in a whiskey glass - CBS News - Link - DB

Whisky fan drinks, notices unclean glass has cool design and makes art!

Additional References

5. Love Me Some expand.grid()

The Essence of R’s expand.grid() Function | by Dror Berel | Towards Data Science - Link - DB

Nothing gets me as excited as 700 words on a base R function like expand.grid(). Tidyverse has a similar expand_grid() function that offers similar but “tidier” output (not the “_”).

Data science is really a hypothesis and an experiment. As the article notes, they are the what and the how: what do you want to know and how are you going to test it.

A lot of times, you limit the inquiry of the how to a single argument. In reality, you can have any number of arguments and each argument can have a vector of values.

R custom functions take the following structure:

FunctionName = function(Argument(s)) {Statement(s)} 

The above is from MODULE 4.9 Custom Functions in R - Link - DB.

Assume you want to generate a trading signal based on two technical indicators: (1) a Simple Moving Average (“SMA”, and (2) the Relative Strength Index (“RSI”). The strategy will:

Note: you can download the entire R script here.

First, create some random stock data, download actual stock data and create the SMA and RSI functions.

# price and sma/rsi functions
p.periods <- 1000
p.mean <- 0.0002952773
p.sd <- 0.01125696
p.start <- 416

# simulated stock data
returns <- 1+rnorm(p.periods, p.mean/252, p.sd)
prices <- cumprod(c(p.start, returns))

# actual stock data
df.stock <- tq_get("SPY", from = Sys.Date() - p.periods, to = Sys.Date())
stock.returns <- df.stock$close/lag(df.stock$close, n = 1)-1
stock.prices <- df.stock$close

fn.sma <- function (price,n){
  sma <- c()
  sma[1:(n-1)] <- NA
  for (i in n:length(price)){
    sma[i]<-mean(price[(i-n+1):i])
  }
  return(sma)
}

fn.rsi <- function (price,n){
  N <- length(price)
  U <- rep(0,N)
  D <- rep(0,N)
  rsi <- rep(NA,N)
  Lprice <- c(NA,price[1:(N-1)])
  for (i in 2:N){
    if (price[i]>=Lprice[i]){
      U[i] <- 1
    } else {
      D[i] <- 1
    }
    if (i>n){
      AvgUp <- mean(U[(i-n+1):i])
      AvgDn <- mean(D[(i-n+1):i])
      rsi[i] <- AvgUp/(AvgUp+AvgDn)*100 
      }
    }
  return(rsi)
}

df.sim <- data.frame(
    "price" = prices, 
    "sma" = fn.sma(prices, 20),   
    "rsi" = fn.rsi(prices, 14)
)

df.act <- data.frame(
    "price" = stock.prices,  
    "sma" = fn.sma(stock.prices, 20),  
    "rsi" = fn.rsi(stock.prices, 14)
)

plot(prices, type='l', lwd = 3, ylab="Simulated Prices", xlab="Periods")
lines(fn.sma(prices, 20), type = "l", lwd = 3, col = "blue")

plot(stock.prices, type='l', lwd = 3, ylab="Actual Prices", xlab="Periods")
lines(fn.sma(stock.prices, 20), type = "l", lwd = 3, col = "blue")

We can plot the simulated stock data with SMA(20).

Here is the actual stock data for $SPY with SMA(20).

Most traders will assume a set look-back period (“n”) for the SMA and RSI. But this is not in the “spirit” of the scientific pursuit. Choosing a single value does not consider other potentially more profitable systems. So an approach that incorporates a more rigorous approach where each Argument has:

The beautiful thing about expand.grid() is that you can take each range of values and combine them symmetrically. Then you can use lapply() or purrr::pmap() to amp up your custom functions.

Set the range of values for the arguments in each function and combine with expand_grid().

# range and combination of values

p.sma.range <- seq(10,100,10)
p.rsi.range <- seq(10,100,10)
p.rsi.lev.entry <- seq(40,100,10)

p.args <- expand_grid(p.sma.range, p.rsi.range, p.rsi.lev.entry)
names(p.args) <- c("sma", "rsi", "rsi_entry")

head(p.args)

# sma   rsi rsi_entry  
# <dbl> <dbl>     <dbl>
# 1    10    10        40
# 2    10    10        50
# 3    10    10        60
# 4    10    10        70
# 5    10    10        80
# 6    10    10        90

Now we have a data.frame of all possible combination of values for both our entry and exit rules as values to use in our argument . We can know use pmap() to apply this to our data, generate signals and measure performance. In this case, I used total return percentage as the sole performance measure.

################################################################################# ACTUAL DATA

# add entry and exit signals
df.act.sig <- pmap(  
    .l = list(
        p.args$sma,     
        p.args$rsi,     
        p.args$rsi_entry  
    ),  
    .f = function(a,b,c)(    
        tibble(      
            prices = stock.prices,      
            returns = stock.returns,
            lead_returns = lead(stock.returns, n = 1),      
            sma = fn.sma(stock.prices, a),      
            rsi = fn.rsi(stock.prices, b),      
            entry = ifelse(stock.prices > fn.sma(stock.prices, a) & fn.rsi(stock.prices, b) <= c, 1, 0),      
            exit = ifelse(stock.prices < fn.sma(stock.prices, a), 1, 0)    
        )  
    ),  
    .progress = TRUE
)

# performance measure
df.act.per <-  
    map(    
        .x = df.act.sig,    
        ~.x %>% 
            filter(entry == 1) %>% 
            summarize(tot_ret = sum(lead_returns, na.rm = T))
    )

df.act.per.all <-   
    p.args %>% 
    mutate(tot_ret = unlist(df.act.per, use.names = F))

We can judge the top 10 performing values on the actual data for each argument value:

df.act.per.all %>%   
    arrange(desc(tot_ret)) %>%  
    slice_head(n = 10)

        sma   rsi rsi_entry tot_ret   
        <dbl> <dbl>     <dbl>   <dbl> 
1    40   100        60   0.172 
2    40    40        50   0.171 
3    30    30        40   0.171 
4    40    90        60   0.161 
5    40    30        70   0.161 
6    40    80        60   0.158 
7    40    20        70   0.158 
8    40    20        80   0.157 
9    40    40        70   0.156
10    40    10        80   0.155

Similarly, the bottom 10 performers:

df.act.per.all %>%   
    arrange(desc(tot_ret)) %>%  
    slice_tail(n = 10)

        sma   rsi rsi_entry tot_ret   
<dbl> <dbl>     <dbl>   <dbl> 
    70    60        50  -0.130 
2   100    60        50  -0.131 
3    50   100        50  -0.133 
4    50    90        50  -0.140 
5    60   100        50  -0.142 
6    60    80        50  -0.142 
7    60    90        50  -0.148 
8    50    60        50  -0.154 
9    60    70        50  -0.156
10    60    60        50  -0.164

Based on our simple strategy, you can judge the best value for each argument (note: the total return for buy and hold was 20.2% vs our top performer at 17.2% so don’t go trading this willy-nilly). Regardless, using expand.grid() to organize and systematically apply values for each argument is a step-forward.

Additional References