Data

class pqp.data.Data(df, vars=None, validate_domain=True, **kwargs)

Bases: Result

Class representing datasets, wraps a pandas DataFrame and contains some metadata

The main job of this class is to validate/create a set of symbolic variables that align with the columns names on df.

Examples:

>>> df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
>>> # it can infer a discrete domain for the variables
>>> Data(df, vars=[Variable("x"), Variable("y")])
>>> Data(df, vars={"x": Variable("X"), "y": Variable("Y")})
>>> # or you can specify the domain explicitly
>>> Data(df, vars={"x": DiscreteDomain([1, 2, 3, 4, 6]), "y": ContinuousDomain([1, 10])})
>>> Data(df, vars={"x": "discrete", "y": "continuous"})
>>> Data(df, vars={"x": None, "y": None}) # specifies a domain of unknown type
>>> # examples of invalid calls
>>> Data(df, vars={"x": Variable("x"), "z": Variable("z")}) # z not in df
>>> Data(df, vars={"x": Variable("x"), "y": "some name"}) # "some name" is interpreted as a domain type
Parameters:
  • df (pandas.DataFrame) – the dataset

  • vars (list or dict) – the variables in the dataset, either a list of Variable (names must match columns in df) or a dict mapping column names in df to Variable or str specifying the type of the variable’s domain

  • validate_domain (bool) – if False, do not validate that the data conforms to the domains specified, defaults to True

  • silence_inferred_domain_warning (bool) – if True, silences the default warning when the domain of a variable is inferred

  • silence_unit_domain_warning (bool) – if True, silences the default warning when an inferred domain for a variable has only a single possible value

df

The dataset

Type:

pandas.DataFrame

vars

dict mapping variable names to Variable objects

Type:

dict

Attributes Summary

n

The number of rows in the dataset

Methods Summary

domain_of(var)

Get the domain of a variable

quantize(var[, n_bins])

Quantize a variable in place

Attributes Documentation

n

The number of rows in the dataset

Methods Documentation

domain_of(var)

Get the domain of a variable

Parameters:

var (str or Variable) – the variable whose domain to get

Returns:

the domain of the variable

Return type:

Domain

quantize(var, n_bins=2)

Quantize a variable in place

Turns a continuous variable into a categorical variable by binning it into n_bins bins.

Parameters:
  • var (str or Variable) – the variable to quantize

  • n_bins (int) – the number of bins to quantize into

Returns:

None