module search_rank.search_engine_vectors
#
Short summary#
module mlinsights.search_rank.search_engine_vectors
Implements a way to get close examples based on the output of a machine learned model.
Classes#
class |
truncated documentation |
---|---|
Implements a kind of local search engine which looks for similar results assuming they are vectors. The class is … |
Static Methods#
staticmethod |
truncated documentation |
---|---|
Restore the features, the metadata to a |
Methods#
method |
truncated documentation |
---|---|
usual |
|
Finds the closest n_neighbors. |
|
Fits the nearest neighbors. |
|
Tells if an objet is an iterator or not. |
|
Stores data in the class itself. |
|
Reorders the closest n_neighbors. |
|
Every vector comes with a list of metadata. |
|
Searches for neighbors close to X. |
|
Saves the features and the metadata into a zipfile. The function does not save the k-nn. |
Documentation#
Implements a way to get close examples based on the output of a machine learned model.
- class mlinsights.search_rank.search_engine_vectors.SearchEngineVectors(**pknn)#
Bases:
object
Implements a kind of local search engine which looks for similar results assuming they are vectors. The class is using sklearn.neighborsNearestNeighbors to find the nearest neighbors of a vector and follows the same API. The class populates members:
features_
: vectors used to compute the neighborsknn_
: parameters for the sklearn.neighborsNearestNeighborsmetadata_
: metadata, can be None
- Parameters:
pknn – list of parameters, see sklearn.neighborsNearestNeighbors
- __init__(**pknn)#
- Parameters:
pknn – list of parameters, see sklearn.neighborsNearestNeighbors
- __repr__()#
usual
- _first_pass(X, n_neighbors=None)#
Finds the closest n_neighbors.
- Parameters:
X – features
n_neighbors – number of neighbors to get (default is the value passed to the constructor)
- Returns:
dist, ind
dist is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix.
- _fit_knn()#
Fits the nearest neighbors.
- _is_iterable(data)#
Tells if an objet is an iterator or not.
- _prepare_fit(data=None, features=None, metadata=None, transform=None)#
Stores data in the class itself.
- Parameters:
data – a dataframe or None if the the features and the metadata are specified with an array and a dictionary
features – features columns or an array
metadata – data
transform – transform each vector before using it
transform is a function whose signature:
def transform(vec, many): # Many tells is the functions receives many vectors # or just one (many=False).
Function transform is applied only if data is not None.
- _second_pass(X, dist, ind)#
Reorders the closest n_neighbors.
- Parameters:
X – features
dist – array representing the lengths to points
ind – indices of the nearest points in the population matrix
- Returns:
score, ind
score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix.
- fit(data=None, features=None, metadata=None)#
Every vector comes with a list of metadata.
- Parameters:
data – a dataframe or None if the the features and the metadata are specified with an array and a dictionary
features – features columns or an array
metadata – data
- kneighbors(X, n_neighbors=None)#
Searches for neighbors close to X.
- Parameters:
X – features
- Returns:
score, ind, meta
score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix, meta is the metadata
- static read_zip(zipfilename, **kwargs)#
Restore the features, the metadata to a
SearchEngineVectors
.- Parameters:
zipfilename – a zipfile.ZipFile or a filename
zname – a filename in th zipfile
kwargs – parameters for pandas.read_csv
- Returns:
It only works for Python 3.6+.
- to_zip(zipfilename, **kwargs)#
Saves the features and the metadata into a zipfile. The function does not save the k-nn.
- Parameters:
zipfilename – a zipfile.ZipFile or a filename
kwargs – parameters for pandas.to_csv (for the metadata)
- Returns:
zipfilename
The function relies on function to_zip. It only works for Python 3.6+.