knnpredict
Classify new data into categories using the kNN algorithm.
label = knnpredict (X, Y, XC)
returns the matrix of labels predicted for the corresponding instances
in XC, using the predictor data in X and corresponding
categorical data in Y. X is used to train the kNN model and
values in XC are classified into classes in Y.
X must be a numeric matrix of input data where rows
correspond to observations and columns correspond to features or variables.
X will be used to train the kNN model.
Y is matrix or cell matrix containing the class labels of
corresponding predictor data in X. Y can contain any type of
categorical data. Y must have same numbers of Rows as X.
XC must be a numeric matrix of query/new points that
are to be classified into the labels.
XC must have same numbers of columns as X.
[label, score, cost] = knnpredict (…) also
returns score, which contains the predicted class scores or posterior
probabilities for each instance for corresponding unique classes in Y,
and cost, which is a matrix containing expected cost of the
classifications. Each row in cost contains the expected cost of
classification of observations in XC into each class of unique
classes in Y.
label = knnpredict (…, Name, Value) returns a
matrix label containing the predicted labels with additional parameters
specified by Name-Value pair arguments listed below.
| Name | Value | |
|---|---|---|
"K" | is the number of nearest neighbors to be found in the kNN search. It must be a positive integer value and by default it is 1. | |
"weights" | is a numeric non-negative matrix
of the observational weights, each row in weights corresponds
to the row in Y and indicates the relative importance or
weight to be considered in calculating the Nearest-neighbour,
negative values are removed before calculations if weights are
specified. default value weight = ones(rows(Y),1). | |
"P" | is the Minkowski distance exponent and it must be
a positive scalar. This argument is only valid when the selected
distance metric is "minkowski". By default it is 2. | |
"scale" | is the scale parameter for the standardized
Euclidean distance and it must be a nonnegative numeric vector of
equal length to the number of columns in X. This argument is
only valid when the selected distance metric is "seuclidean"
, in which case each coordinate of X is scaled by the
corresponding element of "scale", as is each query point in
Y. By default, the scale parameter is the standard deviation
of each coordinate in X. | |
"cov" | is the covariance matrix for computing the
mahalanobis distance and it must be a positive definite matrix
matching the the number of columns in X. This argument is
only valid when the selected distance metric is
"mahalanobis". | |
"cost" | is a numeric matrix containing
misclassification cost for the corresponding instances in X
where R is the number of unique categories in Y.
If an instance is correctly classified into its category the
cost is calculated to be 1, If not then 0. default value
cost = ones(rows(X),numel(unique(Y))). | |
"BucketSize" | is the maximum number of data points in
the leaf node of the Kd-tree and it must be a positive integer.
This argument is only valid when the selected search
method is "kdtree". | |
"Distance" | is the distance metric used by
knnsearch as specified below: |
"euclidean" | Euclidean distance. | |
"seuclidean" | standardized Euclidean distance. Each
coordinate difference between the rows in X and the query matrix
Y is scaled by dividing by the corresponding element of the standard
deviation computed from X. To specify a different scaling, use the
"scale" name-value argument. | |
"cityblock" | City block distance. | |
"chebychev" | Chebychev distance (maximum coordinate difference). | |
"minkowski" | Minkowski distance. The default exponent
is 2. To specify a different exponent, use the "P" name-value
argument. | |
"mahalanobis" | Mahalanobis distance, computed using a
positive definite covariance matrix. To change the value of the covariance
matrix, use the "cov" name-value argument. | |
"cosine" | Cosine distance. | |
"correlation" | One minus the sample linear correlation between observations (treated as sequences of values). | |
"spearman" | One minus the sample Spearman’s rank correlation between observations (treated as sequences of values). | |
"hamming" | Hamming distance, which is the percentage of coordinates that differ. | |
"jaccard" | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ. |
"NSMethod" | is the nearest neighbor search method used
by knnsearch as specified below. |
"kdtree" | Creates and uses a Kd-tree to find nearest
neighbors. "kdtree" is the default value when the number of columns
in X is less than or equal to 10, X is not sparse, and the
distance metric is "euclidean", "cityblock",
"chebychev", or "minkowski". Otherwise, the default value is
"exhaustive". This argument is only valid when the distance metric
is one of the four aforementioned metrics. | |
"exhaustive" | Uses the exhaustive search algorithm by computing the distance values from all the points in X to each point in Y. |
"standardize" | is the flag to indicate if kNN should be calculated after standardizing X. |
Source Code: knnpredict