First time asker so let me know if I can clear anything up.
I am trying to predict the winning probabilities of horse’s in horse racing. I am using the mlogit package to do so. My current method of testing is I have ~50 engineered features. I select 5 at random and train the model. I then compare the winning probability to the markets odds for all horses. If the odds are higher than what my model predicted I use Kelly Criterion to have a bet on that horse. For a certain 5 features I can get a profit of ~15% against the test data (50/50 split) but then I run the model on the training data that was used to train the mlogit model and it give -20% profit.
Is this expected? Why would it preform well on test data but not on training data?
Cheers
y<-mlogit.data(dsTest,shape="long", id.var="Raceno")
x<-mlogit.data(dsTrain_after,choice="Wincol",shape="long", id.var="Raceno")
mymod <- (mlogit(Wincol ~ Start_Price_Standardized + Top3_All_From_3_Races_Standardized +
Number_Standardized + length_model_no_price_pred + length_model_price_pred +
place_model_price_pred - 1, data=x))
I think this only shows that you a a misalignment between what your model does, and how you are evaluating its success. your model predicts odds of winning a race, and you havent shared with us its performance over train/test for that…. your model wasnt developed to turn you a profit, yet you showed that using it the way you do leads to poor or at least erratic performance in the profit domain.