lambda-ml.decision-tree

Decision tree learning using the Classification and Regression Trees (CART) algorithm.

Example usage;

(def data [[0 0 0] [0 1 1] [1 0 1] [1 1 0]])
(def fit
  (let [min-split 2
        min-leaf 1
        max-features 2]
    (-> (make-classification-tree gini-impurity min-split min-leaf max-features)
        (decision-tree-fit data))))
(decision-tree-predict fit (map butlast data))
;;=> (0 1 1 0)

best-splitter

(best-splitter model x y)

Returns the splitter for the given data that minimizes a weighted cost function, or returns nil if no splitter exists.

categorical-partitions

(categorical-partitions vals)

Given a seq of k distinct values, returns the 2^{k-1}-1 possible binary partitions of the values into sets.

classification-weighted-cost

(classification-weighted-cost y1 y2 f g)

decision-tree-fit

(decision-tree-fit model data)(decision-tree-fit model x y)

Fits a decision tree to the given training data.

decision-tree-predict

(decision-tree-predict model x)

Predicts the values of example data using a decision tree.

gini-impurity

(gini-impurity labels)

Returns the Gini impurity of a seq of labels.

make-classification-tree

(make-classification-tree cost min-split min-leaf max-features)

Returns a classification decision tree model using the given cost function.

make-regression-tree

(make-regression-tree cost min-split min-leaf max-features)

Returns a regression decision tree model using the given cost function.

mean-squared-error

(mean-squared-error labels predictions)

Returns the mean squared error for a seq of predictions.

numeric-partitions

(numeric-partitions vals)

Given a seq of k distinct numeric values, returns k-1 possible binary partitions of the values by taking the average of consecutive elements in the sorted seq of values.

print-decision-tree

(print-decision-tree model)

Prints information about a given decision tree.

regression-weighted-cost

(regression-weighted-cost y1 y2 f g)

splitters

(splitters x i)

Returns a seq of all possible splitters for feature i. A splitter is a predicate function that evaluates to true if an example belongs in the left subtree, or false if an example belongs in the right subtree, based on the splitting criterion.