Tuesday, May 21, 2019

Testing vs. Developing




My favorite probability mistake is mistaking developing a hypothesis for testing a hypothesis*. This one is easy to make and it make by lots of people. Smart people and dumb people, people in a variety of fields: finance, self-help**, and history. This basic mistake has two parts. First, look at some data and make a hypothesis: a trend, a connection, a pattern. Second, claim the data "proves" the hypothesis.

Finding something interesting in a data set is good! The next step should be to get more data and see if it has the same interesting thing. You can't test a hypothesis with the same data set that you used to develop the hypothesis!

You could get some data and fit a model to it. The hypothesis would be, "I think the model predicts future data".  Maybe the model predicts future data, but maybe not! The only way to know is if you get more data (this means you have to collect or find new data) then test your model with the new data. If the model fits the new data and the old data, maybe you have something.

*   A hypothesis is a guess you plan to test.
** This is super common in self-help books. Every self-help book “I talked to 20 rich guys and they did this thing, so you should do it to!”

No comments:

Post a Comment

November Fog