statsplotly.plot_specifiers.data package¶
This subpackage defines objects and utility methods for data properties.
- pydantic model statsplotly.plot_specifiers.data.AggregationSpecifier¶
Bases:
BaseModel
- Fields:
aggregated_dimension (statsplotly.plot_specifiers.data._core.DataDimension)
aggregation_func (statsplotly.plot_specifiers.data._core.AggregationType | collections.abc.Callable[[Any], float] | None)
data_pointer (statsplotly.plot_specifiers.data._core.DataPointer)
data_types (statsplotly.plot_specifiers.data._core.DataTypes)
error_bar (statsplotly.plot_specifiers.data._core.ErrorBarType | collections.abc.Callable[[Any], numpy.ndarray[tuple[int, ...], numpy.dtype[Any]]] | None)
- Validators:
check_aggregation_specifier
»all fields
check_error_bar
»error_bar
- field aggregated_dimension: DataDimension [Required]¶
- Validated by:
check_aggregation_specifier
- field aggregation_func: AggregationType | Callable[[Any], float] | None = None¶
- Validated by:
check_aggregation_specifier
- field data_pointer: DataPointer [Required]¶
- Validated by:
check_aggregation_specifier
- field error_bar: ErrorBarType | Callable[[Any], NDArray[Any]] | None = None¶
- Validated by:
check_aggregation_specifier
check_error_bar
- validator check_aggregation_specifier » all fields¶
- validator check_error_bar » error_bar¶
- property aggregation_plot_dimension: DataDimension¶
- property reference_dimension: DataDimension¶
- pydantic model statsplotly.plot_specifiers.data.AggregationTraceData¶
Bases:
TraceData
- Fields:
- Validators:
- classmethod build_aggregation_trace_data(data: DataFrame, aggregation_specifier: AggregationSpecifier) AggregationTraceData ¶
- class statsplotly.plot_specifiers.data.AggregationType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- COUNT = 'count'¶
- FRACTION = 'fraction'¶
- GEO_MEAN = 'geo_mean'¶
- MEAN = 'mean'¶
- MEDIAN = 'median'¶
- PERCENT = 'percent'¶
- SUM = 'sum'¶
- class statsplotly.plot_specifiers.data.CentralTendencyType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- MEAN = 'mean'¶
- MEDIAN = 'median'¶
- class statsplotly.plot_specifiers.data.DataDimension(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- X = 'x'¶
- Y = 'y'¶
- Z = 'z'¶
- pydantic model statsplotly.plot_specifiers.data.DataHandler¶
Bases:
BaseModel
- Fields:
data (pandas.core.frame.DataFrame)
data_pointer (statsplotly.plot_specifiers.data._core.DataPointer)
slice_logical_indices (dict[str, numpy.ndarray[tuple[int, ...], numpy.dtype[Any]]] | None)
slice_order (list[str] | None)
- Validators:
check_header_format
»data
check_pointers_in_data
»all fields
convert_categorical_dtype_columns
»data
- field data: pd.DataFrame [Required]¶
- Validated by:
check_header_format
check_pointers_in_data
convert_categorical_dtype_columns
- field data_pointer: DataPointer [Required]¶
- Validated by:
check_pointers_in_data
- field slice_logical_indices: dict[str, NDArray[Any]] | None = None¶
- Validated by:
check_pointers_in_data
- classmethod build_handler(data: DataFrame | dict[str, Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]] | Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], data_pointer: DataPointer, slice_order: list[str] | None = None) DataHandler ¶
- validator check_header_format » data¶
- validator check_pointers_in_data » all fields¶
- validator convert_categorical_dtype_columns » data¶
- pydantic model statsplotly.plot_specifiers.data.DataPointer¶
Bases:
BaseModel
- Fields:
color (str | None)
error_x (str | None)
error_y (str | None)
error_z (str | None)
marker (str | None)
opacity (str | float | None)
shaded_error (str | None)
size (str | float | None)
slicer (str | None)
text (str | None)
x (str | None)
y (str | None)
z (str | None)
- Validators:
check_missing_dimension
»all fields
- validator check_missing_dimension » all fields¶
- pydantic model statsplotly.plot_specifiers.data.DataProcessor¶
Bases:
BaseModel
- Fields:
data_values_map (dict[statsplotly.plot_specifiers.data._core.DataDimension, dict[str, Any]] | None)
jitter_settings (dict[statsplotly.plot_specifiers.data._core.DataDimension, float] | None)
normalizer (dict[statsplotly.plot_specifiers.data._core.DataDimension, statsplotly.plot_specifiers.data._core.NormalizationType] | None)
- Validators:
check_normalizer
»normalizer
- field jitter_settings: dict[DataDimension, float] | None = None¶
- field normalizer: dict[DataDimension, NormalizationType] | None = None¶
- Validated by:
check_normalizer
- validator check_normalizer » normalizer¶
- static normalize_data(data_series: Series, normalizer: NormalizationType) Series ¶
- pydantic model statsplotly.plot_specifiers.data.DataTypes¶
Bases:
BaseModel
- Fields:
color (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)
marker (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)
size (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)
text (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)
x (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)
y (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)
z (numpy.dtype[Any] | pandas.core.dtypes.dtypes.ArrowDtype | None)
- class statsplotly.plot_specifiers.data.ErrorBarType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- BOOTSTRAP = 'bootstrap'¶
- GEO_STD = 'geo_std'¶
- IQR = 'iqr'¶
- SEM = 'sem'¶
- STD = 'std'¶
- class statsplotly.plot_specifiers.data.HistogramNormType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- COUNT = ''¶
- PERCENT = 'percent'¶
- PROBABILITY = 'probability'¶
- PROBABILITY_DENSITY = 'probability density'¶
- class statsplotly.plot_specifiers.data.NormalizationType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- CENTER = 'center'¶
- MIN_MAX = 'minmax'¶
- ZSCORE = 'zscore'¶
- class statsplotly.plot_specifiers.data.RegressionType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- EXPONENTIAL = 'exponential'¶
- INVERSE = 'inverse'¶
- LINEAR = 'linear'¶
- class statsplotly.plot_specifiers.data.SliceTraceType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
-
- ALL_DATA = 'all data'¶
- SLICE = 'slice'¶
- pydantic model statsplotly.plot_specifiers.data.TraceData¶
Bases:
_BaseTraceData
- Fields:
- Validators:
- classmethod build_trace_data(data: DataFrame, pointer: DataPointer, processor: DataProcessor | None = None) TraceData ¶
Submodules¶
statsplotly.plot_specifiers.data.statistics module¶
Utility statistics functions.
- statsplotly.plot_specifiers.data.statistics.affine_func(x: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) ndarray[tuple[int, ...], dtype[Any]] ¶
The affine function
- statsplotly.plot_specifiers.data.statistics.compute_ssquares(y: ndarray[tuple[int, ...], dtype[Any]], yhat: ndarray[tuple[int, ...], dtype[Any]]) tuple[float, float, float] ¶
Evaluates sum of squares of a least-square regression.
- statsplotly.plot_specifiers.data.statistics.exponential_regress(x: ndarray[tuple[int, ...], dtype[Any]], y: ndarray[tuple[int, ...], dtype[Any]]) tuple[ndarray[tuple[int, ...], dtype[Any]], float, tuple[ndarray[tuple[int, ...], dtype[Any]], ndarray[tuple[int, ...], dtype[Any]]]] ¶
Exponential regression via linear regression of the logarithm
- statsplotly.plot_specifiers.data.statistics.get_iqr(x: ndarray[tuple[int, ...], dtype[Any]]) ndarray[tuple[int, ...], dtype[Any]] ¶
Returns inter-quartile range.
- statsplotly.plot_specifiers.data.statistics.inverse_func(x: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) ndarray[tuple[int, ...], dtype[Any]] ¶
The reciprocal function
- statsplotly.plot_specifiers.data.statistics.kde_1d(x_data: ndarray[tuple[int, ...], dtype[Any]], x_grid: ndarray[tuple[int, ...], dtype[Any]]) ndarray[tuple[int, ...], dtype[Any]] ¶
- statsplotly.plot_specifiers.data.statistics.kde_2d(x_data: ndarray[tuple[int, ...], dtype[Any]], y_data: ndarray[tuple[int, ...], dtype[Any]], x_grid: ndarray[tuple[int, ...], dtype[Any]], y_grid: ndarray[tuple[int, ...], dtype[Any]]) ndarray[tuple[int, ...], dtype[Any]] ¶
- statsplotly.plot_specifiers.data.statistics.logarithmic_func(x: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) float ¶
The logarithmic function
- statsplotly.plot_specifiers.data.statistics.range_normalize(data: ndarray[tuple[int, ...], dtype[Any]], a: float, b: float) ndarray[tuple[int, ...], dtype[Any]] ¶
Normalizes an array between a and b (min and max) values.
- statsplotly.plot_specifiers.data.statistics.regress(x: ndarray[tuple[int, ...], dtype[Any]], y: ndarray[tuple[int, ...], dtype[Any]], func: Callable[[ndarray[tuple[int, ...], dtype[Any]], float, float], ndarray[tuple[int, ...], dtype[Any]]], p0: float | None = None, maxfev: int = 1000) tuple[ndarray[tuple[int, ...], dtype[Any]], float, tuple[ndarray[tuple[int, ...], dtype[Any]], ndarray[tuple[int, ...], dtype[Any]]]] ¶
Regresses y on x using the curve fit method from Scipy.
- statsplotly.plot_specifiers.data.statistics.reject_outliers(data: ndarray[tuple[int, ...], dtype[Any]], m: float = 2.0) ndarray[tuple[int, ...], dtype[Any]] ¶
Uses distance from the median of a distribution to remove outliers. (from https://stackoverflow.com/a/45399188/4696032) Returns the masks of non outliers.