pqp.data package

Submodules

pqp.data.data module

class pqp.data.data.Data(df, vars=None, validate_domain=True, **kwargs)

Bases: Result

Class representing datasets, wraps a pandas DataFrame and contains some metadata

The main job of this class is to validate/create a set of symbolic variables that align with the columns names on df.

Examples:

>>> df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
>>> # it can infer a discrete domain for the variables
>>> Data(df, vars=[Variable("x"), Variable("y")])
>>> Data(df, vars={"x": Variable("X"), "y": Variable("Y")})
>>> # or you can specify the domain explicitly
>>> Data(df, vars={"x": DiscreteDomain([1, 2, 3, 4, 6]), "y": ContinuousDomain([1, 10])})
>>> Data(df, vars={"x": "discrete", "y": "continuous"})
>>> Data(df, vars={"x": None, "y": None}) # specifies a domain of unknown type
>>> # examples of invalid calls
>>> Data(df, vars={"x": Variable("x"), "z": Variable("z")}) # z not in df
>>> Data(df, vars={"x": Variable("x"), "y": "some name"}) # "some name" is interpreted as a domain type
Parameters:
  • df (pandas.DataFrame) – the dataset

  • vars (list or dict) – the variables in the dataset, either a list of Variable (names must match columns in df) or a dict mapping column names in df to Variable or str specifying the type of the variable’s domain

  • validate_domain (bool) – if False, do not validate that the data conforms to the domains specified, defaults to True

  • silence_inferred_domain_warning (bool) – if True, silences the default warning when the domain of a variable is inferred

  • silence_unit_domain_warning (bool) – if True, silences the default warning when an inferred domain for a variable has only a single possible value

df

The dataset

Type:

pandas.DataFrame

vars

dict mapping variable names to Variable objects

Type:

dict

domain_of(var)

Get the domain of a variable

Parameters:

var (str or Variable) – the variable whose domain to get

Returns:

the domain of the variable

Return type:

Domain

property n

The number of rows in the dataset

quantize(var, n_bins=2)

Quantize a variable in place

Turns a continuous variable into a categorical variable by binning it into n_bins bins.

Parameters:
  • var (str or Variable) – the variable to quantize

  • n_bins (int) – the number of bins to quantize into

Returns:

None

pqp.data.domain module

class pqp.data.domain.BinaryDomain

Bases: CategoricalDomain

Domain for a binary variable

describe_assumptions()

Returns a human readable description of assumptions made about the domain

class pqp.data.domain.CategoricalDomain(values)

Bases: DiscreteDomain

Domain of a categorical variable

describe_assumptions()

Returns a human readable description of assumptions made about the domain

get_cardinality()

Calculates the cardinality of the domain

Returns:

the number of possible values in the domain

Return type:

int

get_values()

Returns a list of all possible values in the domain

Returns:

all possible values in the domain

Return type:

list

class pqp.data.domain.ContinuousDomain

Bases: Domain

Abstract base class for continuous domains

get_cardinality()

The number of possible values a variable can take on

class pqp.data.domain.DiscreteDomain

Bases: Domain, ABC

Abstract base class for discrete domains

abstract get_cardinality()

Calculates the cardinality of the domain

Returns:

the number of possible values in the domain

Return type:

int

abstract get_values()

Returns a list of all possible values in the domain

Returns:

all possible values in the domain

Return type:

list

class pqp.data.domain.Domain

Bases: ABC

Manages the possible values that a Variable can take on

abstract describe_assumptions()

Returns a human readable description of assumptions made about the domain

abstract get_cardinality()

The number of possible values a variable can take on

validate(values)

Validates that a list of values is in the domain

Parameters:

values (list) – a list of values to validate

Returns:

True if all values are in the domain, False otherwise

Return type:

bool

validate_or_throw(values)

Validates that a list of values is in the domain, throws an error if not

Parameters:

values (list) – a list of values to validate

Raises:

ValueError – if any value is not in the domain

class pqp.data.domain.IntegerDomain(values)

Bases: DiscreteDomain

check_int(val)

Checks whether a value is an integer

describe_assumptions()

Returns a human readable description of assumptions made about the domain

get_cardinality()

Calculates the cardinality of the domain

Returns:

the number of possible values in the domain

Return type:

int

get_values()

Returns a list of all possible values in the domain

Returns:

all possible values in the domain

Return type:

list

class pqp.data.domain.RealDomain(values)

Bases: ContinuousDomain

A continuous domain for a variable, delimited by min and max values

Parameters:

values (list) – the min and max of this list are taken as the min and max of the domain

describe_assumptions()

Returns a human readable description of assumptions made about the domain

pqp.data.domain.infer_domain_type(vals)

Attempts to infer the best domain type to use of a list of values

The heuristics are somewhat complicated. You should run this and inspect the result before use.

Parameters:

vals (Iterable) – a list of values to infer the domain type of

pqp.data.domain.make_domain(domain_type, values=None)

Generates a domain object from a string and optional values

Options:
  • "discrete": values must be specified

  • "continuous": min and max of values used to specify domain

  • "binary": values are ignored

  • "integer": min and max of values used to specify domain

  • "infer": attempt to guess

Parameters:
  • domain_type (str) – one of "discrete", "continuous", "binary", "integer", or "infer"

  • values (list) – the values of the domain

Module contents