API

PHATE

Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE)

class phate.phate.PHATE(n_components=2, knn=5, decay=40, n_landmark=2000, t='auto', gamma=1, n_pca=100, mds_solver='sgd', knn_dist='euclidean', knn_max=None, mds_dist='euclidean', mds='metric', n_jobs=1, random_state=None, verbose=1, **kwargs)[source]

Bases: sklearn.base.BaseEstimator

PHATE operator which performs dimensionality reduction.

Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE) embeds high dimensional single-cell data into two or three dimensions for visualization of biological progressions as described in Moon et al, 2017 [1].

Parameters:
  • n_components (int, optional, default: 2) – number of dimensions in which the data will be embedded
  • knn (int, optional, default: 5) – number of nearest neighbors on which to build kernel
  • decay (int, optional, default: 40) – sets decay rate of kernel tails. If None, alpha decaying kernel is not used
  • n_landmark (int, optional, default: 2000) – number of landmarks to use in fast PHATE
  • t (int, optional, default: 'auto') – power to which the diffusion operator is powered. This sets the level of diffusion. If ‘auto’, t is selected according to the knee point in the Von Neumann Entropy of the diffusion operator
  • gamma (float, optional, default: 1) – Informational distance constant between -1 and 1. gamma=1 gives the PHATE log potential, gamma=0 gives a square root potential.
  • n_pca (int, optional, default: 100) – Number of principal components to use for calculating neighborhoods. For extremely large datasets, using n_pca < 20 allows neighborhoods to be calculated in roughly log(n_samples) time.
  • mds_solver ({'sgd', 'smacof'}, optional (default: 'sgd')) – which solver to use for metric MDS. SGD is substantially faster, but produces slightly less optimal results. Note that SMACOF was used for all figures in the PHATE paper.
  • knn_dist (string, optional, default: 'euclidean') – recommended values: ‘euclidean’, ‘cosine’, ‘precomputed’ Any metric from scipy.spatial.distance can be used distance metric for building kNN graph. Custom distance functions of form f(x, y) = d are also accepted. If ‘precomputed’, data should be an n_samples x n_samples distance or affinity matrix. Distance matrices are assumed to have zeros down the diagonal, while affinity matrices are assumed to have non-zero values down the diagonal. This is detected automatically using data[0,0]. You can override this detection with knn_dist=’precomputed_distance’ or knn_dist=’precomputed_affinity’.
  • knn_max (int, optional, default: None) – Maximum number of neighbors for which alpha decaying kernel is computed for each point. For very large datasets, setting knn_max to a small multiple of knn can speed up computation significantly.
  • mds_dist (string, optional, default: 'euclidean') – Distance metric for MDS. Recommended values: ‘euclidean’ and ‘cosine’ Any metric from scipy.spatial.distance can be used. Custom distance functions of form f(x, y) = d are also accepted
  • mds (string, optional, default: 'metric') – choose from [‘classic’, ‘metric’, ‘nonmetric’]. Selects which MDS algorithm is used for dimensionality reduction
  • n_jobs (integer, optional, default: 1) – The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used
  • random_state (integer or numpy.RandomState, optional, default: None) – The generator used to initialize SMACOF (metric, nonmetric) MDS If an integer is given, it fixes the seed Defaults to the global numpy random number generator
  • verbose (int or boolean, optional (default: 1)) – If True or > 0, print status messages
  • potential_method (deprecated.) – Use gamma=1 for log transformation and gamma=0 for square root transformation.
  • kwargs (additional arguments for graphtools.Graph) –
X
Type:array-like, shape=[n_samples, n_dimensions]
embedding

Stores the position of the dataset in the embedding space

Type:array-like, shape=[n_samples, n_components]
graph

The graph built on the input data

Type:graphtools.base.BaseGraph
optimal_t

The automatically selected t, when t = ‘auto’. When t is given, optimal_t is None.

Type:int

Examples

>>> import phate
>>> import matplotlib.pyplot as plt
>>> tree_data, tree_clusters = phate.tree.gen_dla(n_dim=100, n_branch=20,
...                                               branch_length=100)
>>> tree_data.shape
(2000, 100)
>>> phate_operator = phate.PHATE(knn=5, decay=20, t=150)
>>> tree_phate = phate_operator.fit_transform(tree_data)
>>> tree_phate.shape
(2000, 2)
>>> phate.plot.scatter2d(tree_phate, c=tree_clusters)

References

[1]Moon KR, van Dijk D, Zheng W, et al. (2017), PHATE: A Dimensionality Reduction Method for Visualizing Trajectory Structures in High-Dimensional Biological Data, BioRxiv.
diff_op

array-like, shape=[n_samples, n_samples] or [n_landmark, n_landmark] The diffusion operator built from the graph

Type:diff_op
diff_potential

Interpolates the PHATE potential to one entry per cell

This is equivalent to calculating infinite-dimensional PHATE, or running PHATE without the MDS step.

Returns:diff_potential
Return type:ndarray, shape=[n_samples, min(n_landmark, n_samples)]
fit(X)[source]

Computes the diffusion operator

Parameters:X (array, shape=[n_samples, n_features]) – input data with n_samples samples and n_dimensions dimensions. Accepted data types: numpy.ndarray, scipy.sparse.spmatrix, pd.DataFrame, anndata.AnnData. If knn_dist is ‘precomputed’, data should be a n_samples x n_samples distance or affinity matrix
Returns:
  • phate_operator (PHATE)
  • The estimator object
fit_transform(X, **kwargs)[source]

Computes the diffusion operator and the position of the cells in the embedding space

Parameters:
  • X (array, shape=[n_samples, n_features]) – input data with n_samples samples and n_dimensions dimensions. Accepted data types: numpy.ndarray, scipy.sparse.spmatrix, pd.DataFrame, anndata.AnnData If knn_dist is ‘precomputed’, data should be a n_samples x n_samples distance or affinity matrix
  • kwargs (further arguments for PHATE.transform()) – Keyword arguments as specified in transform()
Returns:

embedding – The cells embedded in a lower dimensional space using PHATE

Return type:

array, shape=[n_samples, n_dimensions]

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:dict
reset_mds(**kwargs)[source]

Deprecated. Reset parameters related to multidimensional scaling

Parameters:
  • n_components (int, optional, default: None) – If given, sets number of dimensions in which the data will be embedded
  • mds (string, optional, default: None) – choose from [‘classic’, ‘metric’, ‘nonmetric’] If given, sets which MDS algorithm is used for dimensionality reduction
  • mds_dist (string, optional, default: None) – recommended values: ‘euclidean’ and ‘cosine’ Any metric from scipy.spatial.distance can be used If given, sets the distance metric for MDS
reset_potential(**kwargs)[source]

Deprecated. Reset parameters related to the diffusion potential

Parameters:
  • t (int or 'auto', optional, default: None) – Power to which the diffusion operator is powered If given, sets the level of diffusion
  • potential_method (string, optional, default: None) – choose from [‘log’, ‘sqrt’] If given, sets which transformation of the diffusional operator is used to compute the diffusion potential
set_params(**params)[source]

Set the parameters on this estimator.

Any parameters not given as named arguments will be left at their current value.

Parameters:
  • n_components (int, optional, default: 2) – number of dimensions in which the data will be embedded
  • knn (int, optional, default: 5) – number of nearest neighbors on which to build kernel
  • decay (int, optional, default: 40) – sets decay rate of kernel tails. If None, alpha decaying kernel is not used
  • n_landmark (int, optional, default: 2000) – number of landmarks to use in fast PHATE
  • t (int, optional, default: 'auto') – power to which the diffusion operator is powered. This sets the level of diffusion. If ‘auto’, t is selected according to the knee point in the Von Neumann Entropy of the diffusion operator
  • gamma (float, optional, default: 1) – Informational distance constant between -1 and 1. gamma=1 gives the PHATE log potential, gamma=0 gives a square root potential.
  • n_pca (int, optional, default: 100) – Number of principal components to use for calculating neighborhoods. For extremely large datasets, using n_pca < 20 allows neighborhoods to be calculated in roughly log(n_samples) time.
  • mds_solver ({'sgd', 'smacof'}, optional (default: 'sgd')) – which solver to use for metric MDS. SGD is substantially faster, but produces slightly less optimal results. Note that SMACOF was used for all figures in the PHATE paper.
  • knn_dist (string, optional, default: 'euclidean') – recommended values: ‘euclidean’, ‘cosine’, ‘precomputed’ Any metric from scipy.spatial.distance can be used distance metric for building kNN graph. Custom distance functions of form f(x, y) = d are also accepted. If ‘precomputed’, data should be an n_samples x n_samples distance or affinity matrix. Distance matrices are assumed to have zeros down the diagonal, while affinity matrices are assumed to have non-zero values down the diagonal. This is detected automatically using data[0,0]. You can override this detection with knn_dist=’precomputed_distance’ or knn_dist=’precomputed_affinity’.
  • knn_max (int, optional, default: None) – Maximum number of neighbors for which alpha decaying kernel is computed for each point. For very large datasets, setting knn_max to a small multiple of knn can speed up computation significantly.
  • mds_dist (string, optional, default: 'euclidean') – recommended values: ‘euclidean’ and ‘cosine’ Any metric from scipy.spatial.distance can be used distance metric for MDS
  • mds (string, optional, default: 'metric') – choose from [‘classic’, ‘metric’, ‘nonmetric’]. Selects which MDS algorithm is used for dimensionality reduction
  • n_jobs (integer, optional, default: 1) – The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used
  • random_state (integer or numpy.RandomState, optional, default: None) – The generator used to initialize SMACOF (metric, nonmetric) MDS If an integer is given, it fixes the seed Defaults to the global numpy random number generator
  • verbose (int or boolean, optional (default: 1)) – If True or > 0, print status messages

Examples

>>> import phate
>>> import matplotlib.pyplot as plt
>>> tree_data, tree_clusters = phate.tree.gen_dla(n_dim=50, n_branch=5,
...                                               branch_length=50)
>>> tree_data.shape
(250, 50)
>>> phate_operator = phate.PHATE(knn=5, decay=20, t=150)
>>> tree_phate = phate_operator.fit_transform(tree_data)
>>> tree_phate.shape
(250, 2)
>>> phate_operator.set_params(n_components=10)
PHATE(decay=20, knn=5, knn_dist='euclidean', mds='metric',
   mds_dist='euclidean', n_components=10, n_jobs=1, n_landmark=2000,
   n_pca=100, potential_method='log', random_state=None, t=150,
   verbose=1)
>>> tree_phate = phate_operator.transform()
>>> tree_phate.shape
(250, 10)
>>> # plt.scatter(tree_phate[:,0], tree_phate[:,1], c=tree_clusters)
>>> # plt.show()
Returns:
Return type:self
transform(X=None, t_max=100, plot_optimal_t=False, ax=None)[source]

Computes the position of the cells in the embedding space

Parameters:
  • X (array, optional, shape=[n_samples, n_features]) – input data with n_samples samples and n_dimensions dimensions. Not required, since PHATE does not currently embed cells not given in the input matrix to PHATE.fit(). Accepted data types: numpy.ndarray, scipy.sparse.spmatrix, pd.DataFrame, anndata.AnnData. If knn_dist is ‘precomputed’, data should be a n_samples x n_samples distance or affinity matrix
  • t_max (int, optional, default: 100) – maximum t to test if t is set to ‘auto’
  • plot_optimal_t (boolean, optional, default: False) – If true and t is set to ‘auto’, plot the Von Neumann entropy used to select t
  • ax (matplotlib.axes.Axes, optional) – If given and plot_optimal_t is true, plot will be drawn on the given axis.
Returns:

  • embedding (array, shape=[n_samples, n_dimensions])
  • The cells embedded in a lower dimensional space using PHATE

Clustering

phate.cluster.kmeans(phate_op, n_clusters='auto', max_clusters=10, random_state=None, k=None, **kwargs)[source]

KMeans on the PHATE potential

Clustering on the PHATE operator as introduced in Moon et al. This is similar to spectral clustering.

Parameters:
  • phate_op (phate.PHATE) – Fitted PHATE operator
  • n_clusters (int, optional (default: 'auto')) – Number of clusters. If ‘auto’, uses the Silhouette score to determine the optimal number of clusters
  • max_clusters (int, optional (default: 10)) – Maximum number of clusters to test if using the Silhouette score.
  • random_state (int or None, optional (default: None)) – Random seed for k-means
  • k (deprecated for n_clusters) –
  • kwargs (additional arguments for sklearn.cluster.KMeans) –
Returns:

clusters – Integer array of cluster assignments

Return type:

np.ndarray

phate.cluster.silhouette_score(phate_op, n_clusters, random_state=None, **kwargs)[source]

Compute the Silhouette score on KMeans on the PHATE potential

Parameters:
  • phate_op (phate.PHATE) – Fitted PHATE operator
  • n_clusters (int) – Number of clusters.
  • random_state (int or None, optional (default: None)) – Random seed for k-means
Returns:

score

Return type:

float

Plotting

phate.plot.rotate_scatter3d(data, filename=None, elev=30, rotation_speed=30, fps=10, ax=None, figsize=None, dpi=None, ipython_html='jshtml', **kwargs)[source]

Create a rotating 3D scatter plot

Builds upon matplotlib.pyplot.scatter with nice defaults and handles categorical colors / legends better.

Parameters:
  • data (array-like, phate.PHATE or scanpy.AnnData) – Input data. Only the first three dimensions are used.
  • filename (str, optional (default: None)) – If not None, saves a .gif or .mp4 with the output
  • elev (float, optional (default: 30)) – Elevation of viewpoint from horizontal, in degrees
  • rotation_speed (float, optional (default: 30)) – Speed of axis rotation, in degrees per second
  • fps (int, optional (default: 10)) – Frames per second. Increase this for a smoother animation
  • ax (matplotlib.Axes or None, optional (default: None)) – axis on which to plot. If None, an axis is created
  • figsize (tuple, optional (default: None)) – Tuple of floats for creation of new matplotlib figure. Only used if ax is None.
  • dpi (number, optional (default: None)) – Controls the dots per inch for the movie frames. This combined with the figure’s size in inches controls the size of the movie. If None, defaults to rcParams[“savefig.dpi”]
  • ipython_html ({'html5', 'jshtml'}) – which html writer to use if using a Jupyter Notebook
  • **kwargs (keyword arguments) – See :~func:phate.plot.scatter3d.
Returns:

ani – animation object

Return type:

matplotlib.animation.FuncAnimation

Examples

>>> import phate
>>> import matplotlib.pyplot as plt
>>> tree_data, tree_clusters = phate.tree.gen_dla(n_dim=100, n_branch=20,
...                                               branch_length=100)
>>> tree_data.shape
(2000, 100)
>>> phate_operator = phate.PHATE(n_components=3, k=5, a=20, t=150)
>>> tree_phate = phate_operator.fit_transform(tree_data)
>>> tree_phate.shape
(2000, 2)
>>> phate.plot.rotate_scatter3d(tree_phate, c=tree_clusters)
phate.plot.scatter(x, y, z=None, c=None, cmap=None, s=None, discrete=None, ax=None, legend=None, figsize=None, xticks=False, yticks=False, zticks=False, xticklabels=True, yticklabels=True, zticklabels=True, label_prefix='PHATE', xlabel=None, ylabel=None, zlabel=None, title=None, legend_title='', legend_loc='best', filename=None, dpi=None, **plot_kwargs)[source]

Create a scatter plot

Builds upon matplotlib.pyplot.scatter with nice defaults and handles categorical colors / legends better. For easy access, use scatter2d or scatter3d.

Parameters:
  • x (list-like) – data for x axis
  • y (list-like) – data for y axis
  • z (list-like, optional (default: None)) – data for z axis
  • c (list-like or None, optional (default: None)) – Color vector. Can be a single color value (RGB, RGBA, or named matplotlib colors), an array of these of length n_samples, or a list of discrete or continuous values of any data type. If c is not a single or list of matplotlib colors, the values in c will be used to populate the legend / colorbar with colors from cmap
  • cmap (matplotlib colormap, str, dict or None, optional (default: None)) – matplotlib colormap. If None, uses tab20 for discrete data and inferno for continuous data. If a dictionary, expects one key for every unique value in c, where values are valid matplotlib colors (hsv, rbg, rgba, or named colors)
  • s (float, optional (default: 1)) – Point size.
  • discrete (bool or None, optional (default: None)) – If True, the legend is categorical. If False, the legend is a colorbar. If None, discreteness is detected automatically. Data containing non-numeric c is always discrete, and numeric data with 20 or less unique values is discrete.
  • ax (matplotlib.Axes or None, optional (default: None)) – axis on which to plot. If None, an axis is created
  • legend (bool, optional (default: True)) – States whether or not to create a legend. If data is continuous, the legend is a colorbar.
  • figsize (tuple, optional (default: None)) – Tuple of floats for creation of new matplotlib figure. Only used if ax is None.
  • xticks (True, False, or list-like (default: False)) – If True, keeps default x ticks. If False, removes x ticks. If a list, sets custom x ticks
  • yticks (True, False, or list-like (default: False)) – If True, keeps default y ticks. If False, removes y ticks. If a list, sets custom y ticks
  • zticks (True, False, or list-like (default: False)) – If True, keeps default z ticks. If False, removes z ticks. If a list, sets custom z ticks. Only used for 3D plots.
  • xticklabels (True, False, or list-like (default: True)) – If True, keeps default x tick labels. If False, removes x tick labels. If a list, sets custom x tick labels
  • yticklabels (True, False, or list-like (default: True)) – If True, keeps default y tick labels. If False, removes y tick labels. If a list, sets custom y tick labels
  • zticklabels (True, False, or list-like (default: True)) – If True, keeps default z tick labels. If False, removes z tick labels. If a list, sets custom z tick labels. Only used for 3D plots.
  • label_prefix (str or None (default: "PHATE")) – Prefix for all axis labels. Axes will be labelled label_prefix`1, `label_prefix`2, etc. Can be overriden by setting `xlabel, ylabel, and zlabel.
  • xlabel (str or None (default : None)) – Label for the x axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set.
  • ylabel (str or None (default : None)) – Label for the y axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set.
  • zlabel (str or None (default : None)) – Label for the z axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set. Only used for 3D plots.
  • title (str or None (default: None)) – axis title. If None, no title is set.
  • legend_title (str (default: "")) – title for the colorbar of legend. Only used for discrete data.
  • legend_loc (int or string or pair of floats, default: 'best') – Matplotlib legend location. Only used for discrete data. See <https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html> for details.
  • filename (str or None (default: None)) – file to which the output is saved
  • dpi (int or None, optional (default: None)) – The resolution in dots per inch. If None it will default to the value savefig.dpi in the matplotlibrc file. If ‘figure’ it will set the dpi to be the value of the figure. Only used if filename is not None.
  • **plot_kwargs (keyword arguments) – Extra arguments passed to matplotlib.pyplot.scatter.
Returns:

ax – axis on which plot was drawn

Return type:

matplotlib.Axes

Examples

>>> import phate
>>> import matplotlib.pyplot as plt
>>> ###
>>> # Running PHATE
>>> ###
>>> tree_data, tree_clusters = phate.tree.gen_dla(n_dim=100, n_branch=20,
...                                               branch_length=100)
>>> tree_data.shape
(2000, 100)
>>> phate_operator = phate.PHATE(k=5, a=20, t=150)
>>> tree_phate = phate_operator.fit_transform(tree_data)
>>> tree_phate.shape
(2000, 2)
>>> ###
>>> # Plotting using phate.plot
>>> ###
>>> phate.plot.scatter2d(tree_phate, c=tree_clusters)
>>> # You can also pass the PHATE operator instead of data
>>> phate.plot.scatter2d(phate_operator, c=tree_clusters)
>>> phate.plot.scatter3d(phate_operator, c=tree_clusters)
>>> ###
>>> # Using a cmap dictionary
>>> ###
>>> import numpy as np
>>> X = np.random.normal(0,1,[1000,2])
>>> c = np.random.choice(['a','b'], 1000, replace=True)
>>> X[c=='a'] += 10
>>> phate.plot.scatter2d(X, c=c, cmap={'a' : [1,0,0,1], 'b' : 'xkcd:sky blue'})
phate.plot.scatter2d(data, **kwargs)[source]

Create a 2D scatter plot

Builds upon matplotlib.pyplot.scatter with nice defaults and handles categorical colors / legends better.

Parameters:
  • data (array-like, shape=[n_samples, n_features]) – Input data. Only the first two components will be used.
  • c (list-like or None, optional (default: None)) – Color vector. Can be a single color value (RGB, RGBA, or named matplotlib colors), an array of these of length n_samples, or a list of discrete or continuous values of any data type. If c is not a single or list of matplotlib colors, the values in c will be used to populate the legend / colorbar with colors from cmap
  • cmap (matplotlib colormap, str, dict or None, optional (default: None)) – matplotlib colormap. If None, uses tab20 for discrete data and inferno for continuous data. If a dictionary, expects one key for every unique value in c, where values are valid matplotlib colors (hsv, rbg, rgba, or named colors)
  • s (float, optional (default: 1)) – Point size.
  • discrete (bool or None, optional (default: None)) – If True, the legend is categorical. If False, the legend is a colorbar. If None, discreteness is detected automatically. Data containing non-numeric c is always discrete, and numeric data with 20 or less unique values is discrete.
  • ax (matplotlib.Axes or None, optional (default: None)) – axis on which to plot. If None, an axis is created
  • legend (bool, optional (default: True)) – States whether or not to create a legend. If data is continuous, the legend is a colorbar.
  • figsize (tuple, optional (default: None)) – Tuple of floats for creation of new matplotlib figure. Only used if ax is None.
  • xticks (True, False, or list-like (default: False)) – If True, keeps default x ticks. If False, removes x ticks. If a list, sets custom x ticks
  • yticks (True, False, or list-like (default: False)) – If True, keeps default y ticks. If False, removes y ticks. If a list, sets custom y ticks
  • zticks (True, False, or list-like (default: False)) – If True, keeps default z ticks. If False, removes z ticks. If a list, sets custom z ticks. Only used for 3D plots.
  • xticklabels (True, False, or list-like (default: True)) – If True, keeps default x tick labels. If False, removes x tick labels. If a list, sets custom x tick labels
  • yticklabels (True, False, or list-like (default: True)) – If True, keeps default y tick labels. If False, removes y tick labels. If a list, sets custom y tick labels
  • zticklabels (True, False, or list-like (default: True)) – If True, keeps default z tick labels. If False, removes z tick labels. If a list, sets custom z tick labels. Only used for 3D plots.
  • label_prefix (str or None (default: "PHATE")) – Prefix for all axis labels. Axes will be labelled label_prefix`1, `label_prefix`2, etc. Can be overriden by setting `xlabel, ylabel, and zlabel.
  • xlabel (str or None (default : None)) – Label for the x axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set.
  • ylabel (str or None (default : None)) – Label for the y axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set.
  • zlabel (str or None (default : None)) – Label for the z axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set. Only used for 3D plots.
  • title (str or None (default: None)) – axis title. If None, no title is set.
  • legend_title (str (default: "")) – title for the colorbar of legend
  • legend_loc (int or string or pair of floats, default: 'best') – Matplotlib legend location. Only used for discrete data. See <https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html> for details.
  • filename (str or None (default: None)) – file to which the output is saved
  • dpi (int or None, optional (default: None)) – The resolution in dots per inch. If None it will default to the value savefig.dpi in the matplotlibrc file. If ‘figure’ it will set the dpi to be the value of the figure. Only used if filename is not None.
  • **plot_kwargs (keyword arguments) – Extra arguments passed to matplotlib.pyplot.scatter.
Returns:

ax – axis on which plot was drawn

Return type:

matplotlib.Axes

Examples

>>> import phate
>>> import matplotlib.pyplot as plt
>>> ###
>>> # Running PHATE
>>> ###
>>> tree_data, tree_clusters = phate.tree.gen_dla(n_dim=100, n_branch=20,
...                                               branch_length=100)
>>> tree_data.shape
(2000, 100)
>>> phate_operator = phate.PHATE(k=5, a=20, t=150)
>>> tree_phate = phate_operator.fit_transform(tree_data)
>>> tree_phate.shape
(2000, 2)
>>> ###
>>> # Plotting using phate.plot
>>> ###
>>> phate.plot.scatter2d(tree_phate, c=tree_clusters)
>>> # You can also pass the PHATE operator instead of data
>>> phate.plot.scatter2d(phate_operator, c=tree_clusters)
>>> phate.plot.scatter3d(phate_operator, c=tree_clusters)
>>> ###
>>> # Using a cmap dictionary
>>> ###
>>> import numpy as np
>>> X = np.random.normal(0,1,[1000,2])
>>> c = np.random.choice(['a','b'], 1000, replace=True)
>>> X[c=='a'] += 10
>>> phate.plot.scatter2d(X, c=c, cmap={'a' : [1,0,0,1], 'b' : 'xkcd:sky blue'})
phate.plot.scatter3d(data, **kwargs)[source]

Create a 3D scatter plot

Builds upon matplotlib.pyplot.scatter with nice defaults and handles categorical colors / legends better.

Parameters:
  • data (array-like, shape=[n_samples, n_features]) – to be the value of the figure. Only used if filename is not None. Input data. Only the first three components will be used.
  • c (list-like or None, optional (default: None)) – Color vector. Can be a single color value (RGB, RGBA, or named matplotlib colors), an array of these of length n_samples, or a list of discrete or continuous values of any data type. If c is not a single or list of matplotlib colors, the values in c will be used to populate the legend / colorbar with colors from cmap
  • cmap (matplotlib colormap, str, dict or None, optional (default: None)) – matplotlib colormap. If None, uses tab20 for discrete data and inferno for continuous data. If a dictionary, expects one key for every unique value in c, where values are valid matplotlib colors (hsv, rbg, rgba, or named colors)
  • s (float, optional (default: 1)) – Point size.
  • discrete (bool or None, optional (default: None)) – If True, the legend is categorical. If False, the legend is a colorbar. If None, discreteness is detected automatically. Data containing non-numeric c is always discrete, and numeric data with 20 or less unique values is discrete.
  • ax (matplotlib.Axes or None, optional (default: None)) – axis on which to plot. If None, an axis is created
  • legend (bool, optional (default: True)) – States whether or not to create a legend. If data is continuous, the legend is a colorbar.
  • figsize (tuple, optional (default: None)) – Tuple of floats for creation of new matplotlib figure. Only used if ax is None.
  • xticks (True, False, or list-like (default: False)) – If True, keeps default x ticks. If False, removes x ticks. If a list, sets custom x ticks
  • yticks (True, False, or list-like (default: False)) – If True, keeps default y ticks. If False, removes y ticks. If a list, sets custom y ticks
  • zticks (True, False, or list-like (default: False)) – If True, keeps default z ticks. If False, removes z ticks. If a list, sets custom z ticks. Only used for 3D plots.
  • xticklabels (True, False, or list-like (default: True)) – If True, keeps default x tick labels. If False, removes x tick labels. If a list, sets custom x tick labels
  • yticklabels (True, False, or list-like (default: True)) – If True, keeps default y tick labels. If False, removes y tick labels. If a list, sets custom y tick labels
  • zticklabels (True, False, or list-like (default: True)) – If True, keeps default z tick labels. If False, removes z tick labels. If a list, sets custom z tick labels. Only used for 3D plots.
  • label_prefix (str or None (default: "PHATE")) – Prefix for all axis labels. Axes will be labelled label_prefix`1, `label_prefix`2, etc. Can be overriden by setting `xlabel, ylabel, and zlabel.
  • xlabel (str or None (default : None)) – Label for the x axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set.
  • ylabel (str or None (default : None)) – Label for the y axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set.
  • zlabel (str or None (default : None)) – Label for the z axis. Overrides the automatic label given by label_prefix. If None and label_prefix is None, no label is set. Only used for 3D plots.
  • title (str or None (default: None)) – axis title. If None, no title is set.
  • legend_title (str (default: "")) – title for the colorbar of legend
  • legend_loc (int or string or pair of floats, default: 'best') – Matplotlib legend location. Only used for discrete data. See <https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html> for details.
  • filename (str or None (default: None)) – file to which the output is saved
  • dpi (int or None, optional (default: None)) – The resolution in dots per inch. If None it will default to the value savefig.dpi in the matplotlibrc file. If ‘figure’ it will set the dpi to be the value of the figure. Only used if filename is not None.
  • **plot_kwargs (keyword arguments) – Extra arguments passed to matplotlib.pyplot.scatter.
Returns:

ax – axis on which plot was drawn

Return type:

matplotlib.Axes

Examples

>>> import phate
>>> import matplotlib.pyplot as plt
>>> ###
>>> # Running PHATE
>>> ###
>>> tree_data, tree_clusters = phate.tree.gen_dla(n_dim=100, n_branch=20,
...                                               branch_length=100)
>>> tree_data.shape
(2000, 100)
>>> phate_operator = phate.PHATE(k=5, a=20, t=150)
>>> tree_phate = phate_operator.fit_transform(tree_data)
>>> tree_phate.shape
(2000, 2)
>>> ###
>>> # Plotting using phate.plot
>>> ###
>>> phate.plot.scatter2d(tree_phate, c=tree_clusters)
>>> # You can also pass the PHATE operator instead of data
>>> phate.plot.scatter2d(phate_operator, c=tree_clusters)
>>> phate.plot.scatter3d(phate_operator, c=tree_clusters)
>>> ###
>>> # Using a cmap dictionary
>>> ###
>>> import numpy as np
>>> X = np.random.normal(0,1,[1000,2])
>>> c = np.random.choice(['a','b'], 1000, replace=True)
>>> X[c=='a'] += 10
>>> phate.plot.scatter2d(X, c=c, cmap={'a' : [1,0,0,1], 'b' : 'xkcd:sky blue'})

Example Data

phate.tree.artificial_tree()[source]
phate.tree.gen_dla(n_dim=100, n_branch=20, branch_length=100, rand_multiplier=2, seed=37, sigma=4)[source]