The challenge here is to write a very fast and correct implementation of a tertiary random forest.

To get started you can fork a branch from our open questions repository at Github. Github folder specific to this competition is algoques.  

Motivation :
A majority of data science applications also emphasize speed of computation along with accuracy. There is anecdotal evidence of the fact that non linear decision tree methods have performed well in a variety of data science applications. The challenge of using it in the domain of HFT will be, among others, implementing it efficiently. On most events, there are some "knee-jerk" reactions where the response function is largely simple and computing the short term effect of the event is easy. Then there are more complex responses. To model them we need more sophisticated perhaps non-linear models that take more time. Using a sophisticated model for an elementary prediction would leave it vulnerable to be too slow at the task. This problem is very prevalent in the domain of finance. In finance, one encounters relationships that hold in even very small durations, like in ten seconds after an event has occurred. Then there are relationships that seem to not hold consistently over small durations but show up more often when one looks at longer periods like months and years.

Input:
You can download a collection of data files each of which is a data set of indicator values which have been snapped at regular intervals. The structure of the data file is further explained in README.txt. We have written a wrapper file, process_data.cpp, that reads the data and calls the function OnInputChange on the TertiaryRandomForest class. The two arguments of the function are the index of the input variable that has changed, and the new value. For instance if indicator 5 has changed and the new value of the indicator is -1, process_data.cpp will call OnInputChange ( 5, -1 ) on the TertiaryRandomForest.

Output:
To measure correctness, at every 'samplingrate' number of function calls, the predicted value is printed. We will try to compare our benchmark solution to yours, and as long as every prediction differs by not more than 1%, we will consider the values to be correct. We allow a margin of error to account for any floating point errors in computation as well as allow any optimizations that might be possible with approximate computations. In this domain a very small difference in predicted price should not affect the outcome. If that margin of error allows one to reduce latency the benefit is often more than the cost. You can download some sample outputs as well. 

Deliverables:
1) Code : Submit an implementation of TertiaryRandomForest class in tertiary_random_forest.hpp . process_data.cpp demonstrates one of the ways in which TertiaryRandomForest can be used, submitted implementation should be generic so that one can use it as an API. Use of SubscribeOutputChange is not demonstrated in process_data.cpp but it should be implemented so that subcribers should recieve all the output changes from TertiaryRandomForest. 
2) Documentation : It should cover -
    i)    overview of logic used to implement the forest  
    ii)   any critical design decisions 
    iii)  any testing to check the correctness of logic.


FAQ:
Q: When is the deadline for submission ?
A: It is a continuously running competition and the best solutions will be rewarded on a rolling basis.

Q: What does each file present in the Github folder "algoques" mean ?
A: Please look at the README.txt.

Q: How do I get sample data to run process_data on ?
A: This is an extensive collection of data files to test on.

Q: Would you be providing examples of the forest info file ?
A: The file sample_random_forest.txt is a valid example. For further testing we request you to refer to the file RANDOM_FOREST_DESCRIPTION.txt to be able to create more test cases. The file Sample_Random_Forest_Spec.pdf will provide you a graphical representation of a sample random forest.

Q: Would you be providing an output file to check output with ?
A: Yes. You can download sample outputs. Look at README.txt for details.

Q: Would I be a good fit in the company if I do well in this competition ?
A: I am sure a person who excels in this competition will be a great fit in the company. That is a part of our motivation.