There are no models without assumptions

There are no models without assumptions

Assumptions will be necessary to:

  • Sample or gather data
    • How much data are you sampling?
    • When are you sampling?
      • Can you mitigate survivorship bias?
    • Are you sampling randomly from the full distribution?
      • Or if that’s not possible, what kind of constraints do you have?
      • Can you control for all the potential confounders with the sampling?
    • (If it’s a time series): how frequent is the sampling period?
  • Build models
    • What kind of relationship do you want to capture?
    • How are you battling “overfitting” or “p-value hacking”?

And, the mother of all assumptions:

So, the important thing is to not to pretend that the model is objective, and don’t confuse it with reality.