But the textbook examples always work, right?
Ever had a crazy idea? What if you could use this technology to predict the stock market?
So let’s see, what do I want?
I want to be able to sleep at night!
Because really, do you want to risk even 10K $ on a black-box algorithm that you cooked up one late evening after taking an introductory course in data science? Not quite.
So, I want to be in the market, only when the market is open. Ideally, I’d like to be able to predict whether a stock is going to finish higher or lower than it’s open.
And let’s not get too carried away. A prediction of up/down will do for now. I’ll be happy with a classifier that tell me whether the stock is going to finish in the black, or in the red, at the end of the day.
Ooooh, I could even use 4X margin! So even if the stock only moves 0.05%, on a margin account that would still translate to a 0.2% profit. Let’s assume there are roughly 250 trading days in one year… I’m betting on a minimum return of roughly 65%. On my 10K principal, that 6.5K to take the kids to Disney at least twice, I figure.
So far the hypotheticals. And boy was I wrong about those hypotheticals.
But let’s go through the experiment anyway.
I started by getting the historical daily returns of the SPY S&P500 ETF tracker. From there, I computed the daily intraday return (close – open) / open.
Because I’m merely interested in a binary up or down prediction though, another column JWhite was added that’s true when the intraday return was up, and false when the intraday return was down.
Add a couple of technical indicators of the previous days leading up to the session, and we’re good to go.
Oh, wait, about those indicators… I remember Bollinger writing something about it being better to rank indicator values, rather than work with their absolute values instead. So I converted all absolute technical indicators to their 50-day ranked values.
Eventually, my dataset looks like this:
So now I can build my experiment to build and evaluate a model to predict JWhite, and what do you expect to see?
Ehm, yeah, right… I got a perfect prediction from the first attempt?
Can anybody tell me what I did wrong?
The problem is actually that I have the perfect predictor to JWhite within my dataset. I included both the intraday return, as well as the JWhite variable.
So let’s add a feature selector to the experiment that gets rid of the daily returns:
What do we get now?
Hmm, a diagonal this time. Looks like I’m about as good as a coin toss. Pretty bad actually. This is truly about the worst outcome one can have, as a curve below the diagonal at least could be reversed to get some results.
Well, it looks like this technology isn’t going to make me rich, but the exercise taught me a few more things about data science at least.
So did you ever try anything like this? What are your most hilarious failures in data science? Let’s get a conversation started in the comments below!