Assumptions will be necessary to:
- Sample or gather data
- How much data are you sampling?
- When are you sampling?
- Can you mitigate survivorship bias?
- Are you sampling randomly from the full distribution?
- Or if that’s not possible, what kind of constraints do you have?
- Can you control for all the potential confounders with the sampling?
- (If it’s a time series): how frequent is the sampling period?
- Build models
- What kind of relationship do you want to capture?
- How are you battling “overfitting” or “p-value hacking”?
And, the mother of all assumptions: Is the world we want to model stationary, at all?
So, the important thing is to not to pretend that the model is objective, and don’t confuse it with reality.