statsplotly.plot_specifiers.data package

This subpackage defines objects and utility methods for data properties.

pydantic model statsplotly.plot_specifiers.data.AggregationSpecifier

Bases: BaseModel

Fields:
  • aggregated_dimension (statsplotly.plot_specifiers.data._core.DataDimension)

  • aggregation_func (statsplotly.plot_specifiers.data._core.AggregationType | collections.abc.Callable[[Any], float] | None)

  • data_pointer (statsplotly.plot_specifiers.data._core.DataPointer)

  • data_types (statsplotly.plot_specifiers.data._core.DataTypes)

  • error_bar (statsplotly.plot_specifiers.data._core.ErrorBarType | collections.abc.Callable[[Any], numpy.ndarray[tuple[int, ...], numpy.dtype[Any]]] | None)

Validators:
  • check_aggregation_specifier » all fields

  • check_error_bar » error_bar

field aggregated_dimension: DataDimension [Required]
Validated by:
  • check_aggregation_specifier

field aggregation_func: AggregationType | Callable[[Any], float] | None = None
Validated by:
  • check_aggregation_specifier

field data_pointer: DataPointer [Required]
Validated by:
  • check_aggregation_specifier

field data_types: DataTypes [Required]
Validated by:
  • check_aggregation_specifier

field error_bar: ErrorBarType | Callable[[Any], NDArray[Any]] | None = None
Validated by:
  • check_aggregation_specifier

  • check_error_bar

validator check_aggregation_specifier  »  all fields
validator check_error_bar  »  error_bar
property aggregated_data: str
property aggregation_plot_dimension: DataDimension
property is_mono_referenced: bool
property reference_data: str | None
property reference_dimension: DataDimension
pydantic model statsplotly.plot_specifiers.data.AggregationTraceData

Bases: TraceData

Fields:

Validators:

classmethod build_aggregation_trace_data(data: DataFrame, aggregation_specifier: AggregationSpecifier) AggregationTraceData
class statsplotly.plot_specifiers.data.AggregationType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

COUNT = 'count'
FRACTION = 'fraction'
GEO_MEAN = 'geo_mean'
MEAN = 'mean'
MEDIAN = 'median'
PERCENT = 'percent'
SUM = 'sum'
class statsplotly.plot_specifiers.data.CentralTendencyType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

MEAN = 'mean'
MEDIAN = 'median'
class statsplotly.plot_specifiers.data.DataDimension(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

X = 'x'
Y = 'y'
Z = 'z'
pydantic model statsplotly.plot_specifiers.data.DataHandler

Bases: BaseModel

Fields:
  • data (pandas.core.frame.DataFrame)

  • data_pointer (statsplotly.plot_specifiers.data._core.DataPointer)

  • slice_logical_indices (dict[str, numpy.ndarray[tuple[int, ...], numpy.dtype[Any]]] | None)

  • slice_order (list[str] | None)

Validators:
  • check_header_format » data

  • check_pointers_in_data » all fields

  • convert_categorical_dtype_columns » data

field data: pd.DataFrame [Required]
Validated by:
  • check_header_format

  • check_pointers_in_data

  • convert_categorical_dtype_columns

field data_pointer: DataPointer [Required]
Validated by:
  • check_pointers_in_data

field slice_logical_indices: dict[str, NDArray[Any]] | None = None
Validated by:
  • check_pointers_in_data

field slice_order: list[str] | None = None
Validated by:
  • check_pointers_in_data

classmethod build_handler(data: DataFrame | dict[str, Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]] | Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], data_pointer: DataPointer, slice_order: list[str] | None = None) DataHandler
validator check_header_format  »  data
validator check_pointers_in_data  »  all fields
validator convert_categorical_dtype_columns  »  data
get_data(dimension: str) Series | None
get_mean(dimension: str) DataFrame
get_median(dimension: str) DataFrame
iter_slices() Generator[tuple[str, DataFrame]]
static to_dataframe(function: F) DataFrame
property data_types: DataTypes
property n_slices: int
property slice_levels: list[str]
pydantic model statsplotly.plot_specifiers.data.DataPointer

Bases: BaseModel

Fields:
  • color (str | None)

  • error_x (str | None)

  • error_y (str | None)

  • error_z (str | None)

  • marker (str | None)

  • opacity (str | float | None)

  • shaded_error (str | None)

  • size (str | float | None)

  • slicer (str | None)

  • text (str | None)

  • x (str | None)

  • y (str | None)

  • z (str | None)

Validators:
  • check_missing_dimension » all fields

field color: str | None = None
Validated by:
  • check_missing_dimension

field error_x: str | None = None
Validated by:
  • check_missing_dimension

field error_y: str | None = None
Validated by:
  • check_missing_dimension

field error_z: str | None = None
Validated by:
  • check_missing_dimension

field marker: str | None = None
Validated by:
  • check_missing_dimension

field opacity: str | float | None = None
Validated by:
  • check_missing_dimension

field shaded_error: str | None = None
Validated by:
  • check_missing_dimension

field size: str | float | None = None
Validated by:
  • check_missing_dimension

field slicer: str | None = None
Validated by:
  • check_missing_dimension

field text: str | None = None
Validated by:
  • check_missing_dimension

field x: str | None = None
Validated by:
  • check_missing_dimension

field y: str | None = None
Validated by:
  • check_missing_dimension

field z: str | None = None
Validated by:
  • check_missing_dimension

validator check_missing_dimension  »  all fields
property text_identifiers: list[str] | None
pydantic model statsplotly.plot_specifiers.data.DataProcessor

Bases: BaseModel

Fields:
  • data_values_map (dict[statsplotly.plot_specifiers.data._core.DataDimension, dict[str, Any]] | None)

  • jitter_settings (dict[statsplotly.plot_specifiers.data._core.DataDimension, float] | None)

  • normalizer (dict[statsplotly.plot_specifiers.data._core.DataDimension, statsplotly.plot_specifiers.data._core.NormalizationType] | None)

Validators:
  • check_normalizer » normalizer

field data_values_map: dict[DataDimension, dict[str, Any]] | None = None
field jitter_settings: dict[DataDimension, float] | None = None
field normalizer: dict[DataDimension, NormalizationType] | None = None
Validated by:
  • check_normalizer

validator check_normalizer  »  normalizer
static jitter_data(data_series: Series, jitter_amount: float) Series
static normalize_data(data_series: Series, normalizer: NormalizationType) Series
process_trace_data(trace_data: dict[str, Series]) Series
pydantic model statsplotly.plot_specifiers.data.DataTypes

Bases: BaseModel

Fields:
  • color (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)

  • marker (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)

  • size (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)

  • text (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)

  • x (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)

  • y (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)

  • z (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)

field color: _Dtype | None = None
field marker: _Dtype | None = None
field size: _Dtype | None = None
field text: _Dtype | None = None
field x: _Dtype | None = None
field y: _Dtype | None = None
field z: _Dtype | None = None
class statsplotly.plot_specifiers.data.ErrorBarType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

BOOTSTRAP = 'bootstrap'
GEO_STD = 'geo_std'
IQR = 'iqr'
SEM = 'sem'
STD = 'std'
class statsplotly.plot_specifiers.data.HistogramNormType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

COUNT = ''
PERCENT = 'percent'
PROBABILITY = 'probability'
PROBABILITY_DENSITY = 'probability density'
class statsplotly.plot_specifiers.data.NormalizationType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

CENTER = 'center'
MIN_MAX = 'minmax'
ZSCORE = 'zscore'
class statsplotly.plot_specifiers.data.RegressionType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

EXPONENTIAL = 'exponential'
INVERSE = 'inverse'
LINEAR = 'linear'
class statsplotly.plot_specifiers.data.SliceTraceType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

ALL_DATA = 'all data'
SLICE = 'slice'
pydantic model statsplotly.plot_specifiers.data.TraceData

Bases: _BaseTraceData

Fields:

Validators:

classmethod build_trace_data(data: DataFrame, pointer: DataPointer, processor: DataProcessor | None = None) TraceData

Submodules

statsplotly.plot_specifiers.data.statistics module

Utility statistics functions.

statsplotly.plot_specifiers.data.statistics.affine_func(x: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) ndarray[tuple[int, ...], dtype[Any]]

The affine function

statsplotly.plot_specifiers.data.statistics.compute_ssquares(y: ndarray[tuple[int, ...], dtype[Any]], yhat: ndarray[tuple[int, ...], dtype[Any]]) tuple[float, float, float]

Evaluates sum of squares of a least-square regression.

statsplotly.plot_specifiers.data.statistics.exponential_regress(x: ndarray[tuple[int, ...], dtype[Any]], y: ndarray[tuple[int, ...], dtype[Any]]) tuple[ndarray[tuple[int, ...], dtype[Any]], float, tuple[ndarray[tuple[int, ...], dtype[Any]], ndarray[tuple[int, ...], dtype[Any]]]]

Exponential regression via linear regression of the logarithm

statsplotly.plot_specifiers.data.statistics.get_iqr(x: ndarray[tuple[int, ...], dtype[Any]]) ndarray[tuple[int, ...], dtype[Any]]

Returns inter-quartile range.

statsplotly.plot_specifiers.data.statistics.inverse_func(x: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) ndarray[tuple[int, ...], dtype[Any]]

The reciprocal function

statsplotly.plot_specifiers.data.statistics.kde_1d(x_data: ndarray[tuple[int, ...], dtype[Any]], x_grid: ndarray[tuple[int, ...], dtype[Any]]) ndarray[tuple[int, ...], dtype[Any]]
statsplotly.plot_specifiers.data.statistics.kde_2d(x_data: ndarray[tuple[int, ...], dtype[Any]], y_data: ndarray[tuple[int, ...], dtype[Any]], x_grid: ndarray[tuple[int, ...], dtype[Any]], y_grid: ndarray[tuple[int, ...], dtype[Any]]) ndarray[tuple[int, ...], dtype[Any]]
statsplotly.plot_specifiers.data.statistics.logarithmic_func(x: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) float

The logarithmic function

statsplotly.plot_specifiers.data.statistics.range_normalize(data: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) ndarray[tuple[int, ...], dtype[Any]]

Normalizes an array between a and b (min and max) values.

statsplotly.plot_specifiers.data.statistics.regress(x: ndarray[tuple[int, ...], dtype[Any]], y: ndarray[tuple[int, ...], dtype[Any]], func: Callable[[ndarray[tuple[int, ...], dtype[Any]], float, float], ndarray[tuple[int, ...], dtype[Any]]], p0: float | None = None, maxfev: int = 1000) tuple[ndarray[tuple[int, ...], dtype[Any]], float, tuple[ndarray[tuple[int, ...], dtype[Any]], ndarray[tuple[int, ...], dtype[Any]]]]

Regresses y on x using the curve fit method from Scipy.

statsplotly.plot_specifiers.data.statistics.reject_outliers(data: ndarray[tuple[int, ...], dtype[Any]], m: float = 2.0) ndarray[tuple[int, ...], dtype[Any]]

Uses distance from the median of a distribution to remove outliers. (from https://stackoverflow.com/a/45399188/4696032) Returns the masks of non outliers.

statsplotly.plot_specifiers.data.statistics.sem(data: ndarray[tuple[int, ...], dtype[Any]], confidence_level: float = 0.95) float

Returns the margin of error at the given confidence level.