module search_rank.search_engine_vectors
¶
Short summary¶
module mlinsights.search_rank.search_engine_vectors
Implements a way to get close examples based on the output of a machine learned model.
Classes¶
class 
truncated documentation 

Implements a kind of local search engine which looks for similar results assuming they are vectors. The class is … 
Static Methods¶
staticmethod 
truncated documentation 

Restore the features, the metadata to a 
Methods¶
method 
truncated documentation 

usual 

Finds the closest n_neighbors. 

Fits the nearest neighbors. 

Tells if an objet is an iterator or not. 

Stores data in the class itself. 

Reorders the closest n_neighbors. 

Every vector comes with a list of metadata. 

Searches for neighbors close to X. 

Saves the features and the metadata into a zipfile. The function does not save the knn. 
Documentation¶
Implements a way to get close examples based on the output of a machine learned model.

class
mlinsights.search_rank.search_engine_vectors.
SearchEngineVectors
(**pknn)[source]¶ Bases:
object
Implements a kind of local search engine which looks for similar results assuming they are vectors. The class is using sklearn.neighborsNearestNeighbors to find the nearest neighbors of a vector and follows the same API. The class populates members:
features_
: vectors used to compute the neighborsknn_
: parameters for the sklearn.neighborsNearestNeighborsmetadata_
: metadata, can be None
 Parameters
pknn – list of parameters, see sklearn.neighborsNearestNeighbors

__init__
(**pknn)[source]¶  Parameters
pknn – list of parameters, see sklearn.neighborsNearestNeighbors

_first_pass
(X, n_neighbors=None)[source]¶ Finds the closest n_neighbors.
 Parameters
X – features
n_neighbors – number of neighbors to get (default is the value passed to the constructor)
 Returns
dist, ind
dist is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix.

_prepare_fit
(data=None, features=None, metadata=None, transform=None)[source]¶ Stores data in the class itself.
 Parameters
data – a dataframe or None if the the features and the metadata are specified with an array and a dictionary
features – features columns or an array
metadata – data
transform – transform each vector before using it
transform is a function whose signature:
def transform(vec, many): # Many tells is the functions receives many vectors # or just one (many=False).
Function transform is applied only if data is not None.

_second_pass
(X, dist, ind)[source]¶ Reorders the closest n_neighbors.
 Parameters
X – features
dist – array representing the lengths to points
ind – indices of the nearest points in the population matrix
 Returns
score, ind
score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix.

fit
(data=None, features=None, metadata=None)[source]¶ Every vector comes with a list of metadata.
 Parameters
data – a dataframe or None if the the features and the metadata are specified with an array and a dictionary
features – features columns or an array
metadata – data

kneighbors
(X, n_neighbors=None)[source]¶ Searches for neighbors close to X.
 Parameters
X – features
 Returns
score, ind, meta
score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix, meta is the metadata

static
read_zip
(zipfilename, **kwargs)[source]¶ Restore the features, the metadata to a
SearchEngineVectors
. Parameters
zipfilename – a zipfile.ZipFile or a filename
zname – a filename in th zipfile
kwargs – parameters for pandas.read_csv
 Returns
It only works for Python 3.6+.

to_zip
(zipfilename, **kwargs)[source]¶ Saves the features and the metadata into a zipfile. The function does not save the knn.
 Parameters
zipfilename – a zipfile.ZipFile or a filename
kwargs – parameters for pandas.to_csv (for the metadata)
 Returns
zipfilename
The function relies on function to_zip. It only works for Python 3.6+.