cgnal.core.data.model package

Submodules

cgnal.core.data.model.core module

Module with base abstraction of common objects.

class cgnal.core.data.model.core.BaseIterable(*args, **kwds)

Bases: Generic[cgnal.core.data.model.core.T]

Abstract class defining interface for iterables.

abstract property cached: bool

Whether the iterable is cached in memory or lazy.

Returns

boolean indicating whether iterable is fully-stored in memory

abstract classmethod empty() cgnal.core.data.model.core.BaseIterableType

Return an empty iterable instance.

abstract property items: Iterable[cgnal.core.data.model.core.T]

Return an iterator over the items.

Returns

Iterable[T]

abstract property type: Type[cgnal.core.data.model.core.T]

Return the type of the objects in the Iterable.

class cgnal.core.data.model.core.BaseRange

Bases: abc.ABC

Abstract Range Class.

property business_days: List[pandas._libs.tslibs.timestamps.Timestamp]

Create date range with daily frequency.

Returns

list of pd.Timestamp from start to end with daily frequency including only days from Mon to Fri

property days: List[pandas._libs.tslibs.timestamps.Timestamp]

Create date range with daily frequency.

Returns

list of pd.Timestamp from start to end with daily frequency

abstract property end: pandas._libs.tslibs.timestamps.Timestamp

Return the last timestamp.

Returns

Timestamp

property minutes_15: List[pandas._libs.tslibs.timestamps.Timestamp]

Create date range with daily frequency.

Returns

list of pd.Timestamp from start to end with 15 minutes frequency

abstract overlaps(other: cgnal.core.data.model.core.BaseRange) bool

Return whether two ranges overlaps.

Parameters

other – other range to be compared with

Returns

True if the two ranges intersect, False otherwise

abstract range(freq='H') List[pandas._libs.tslibs.timestamps.Timestamp]

Return list of timestamps, spaced by given frequency.

Parameters

freq – frequency of timestamps, valid values are “D” (day), “H” (hours), “M”(minute), “S” (seconds).

Returns

list of timestamps

abstract property start: pandas._libs.tslibs.timestamps.Timestamp

Return the first timestamp.

Returns

Timestamp

class cgnal.core.data.model.core.CachedIterable(items: Sequence[cgnal.core.data.model.core.T])

Bases: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.core.T]

Base class to be used for implementing cached iterables.

Return instance of a class to be used for implementing cached iterables.

Parameters

items – sequence or iterable of elements

property cached: bool

Whether the iterable is cached in memory or lazy.

Returns

boolean indicating whether iterable is fully-stored in memory

classmethod empty() cgnal.core.data.model.core.CachedIterableType

Return an empty cached iterable.

Returns

Empty instance

classmethod from_iterable(iterable: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.core.T]) cgnal.core.data.model.core.CachedIterableType

Create a new instance of this class from a BaseIterable instance.

Parameters

iterable – iterable instance

Returns

cached iterable

property items: Sequence[cgnal.core.data.model.core.T]

Return an iterator over the items.

Returns

Iterable[T]

class cgnal.core.data.model.core.CompositeRange(ranges: List[cgnal.core.data.model.core.Range])

Bases: cgnal.core.data.model.core.BaseRange

Class representing a composition of ranges.

Return a range made up of multiple ranges.

Parameters

ranges – List of Ranges

property end: pandas._libs.tslibs.timestamps.Timestamp

Return the last timestamp.

Returns

Timestamp

overlaps(other: cgnal.core.data.model.core.BaseRange) bool

Return whether two ranges overlaps.

Parameters

other – BaseRange, other range to be compared with

Returns

bool, True if the two ranges intersect, False otherwise

range(freq='H') List[pandas._libs.tslibs.timestamps.Timestamp]

Return list of timestamps, spaced by given frequency.

Parameters

freq – given frequency

Returns

list of timestamps

simplify() Union[cgnal.core.data.model.core.CompositeRange, cgnal.core.data.model.core.Range]

Simplify the list into disjoint Range objects, aggregating non-disjoint ranges.

If only one range would be present, a simple Range object is returned.

Returns

BaseRange

property start: pandas._libs.tslibs.timestamps.Timestamp

Return the first timestamp.

Returns

Timestamp

class cgnal.core.data.model.core.DillSerialization

Bases: cgnal.core.data.model.core.Serializable

Serialization based on dill package.

classmethod load(filename: Union[str, os.PathLike[str]]) DillSerialization

Load instance from file.

Parameters

filename – Name of the file to be read

Returns

Instance of the read Model

write(filename: Union[str, os.PathLike[str]]) None

Write instance as pickle.

Parameters

filename – Name of the file where to save the instance

class cgnal.core.data.model.core.IterGenerator(generator_function: Callable[[], Iterator[cgnal.core.data.model.core.T]], _type: Optional[Type[cgnal.core.data.model.core.T]] = None)

Bases: Generic[cgnal.core.data.model.core.T]

Base class representing any generator.

Class that allows a given generator to be accessed as an Iterator via .iterator property.

Parameters
  • generator_function – function that outputs a generator

  • _type – type returned by the generartor, required when the generator is empty

Raises
  • TypeError – when type mismatch happens between generator and provided type

  • ValueError – when an empty generator is provided without _type specification

property iterator: Iterator[cgnal.core.data.model.core.T]

Return an iterator over the given generator function.

Returns

an iterator

class cgnal.core.data.model.core.IterableUtilsMixin(*args, **kwargs)

Bases: Generic[cgnal.core.data.model.core.T, cgnal.core.data.model.core.LazyIterableType, cgnal.core.data.model.core.CachedIterableType], cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.core.T], abc.ABC

Class to provide base interfaces and methods for enhancing iterables classes and enable more functional approaches.

In particular, the class provides among others implementation for map, filter and foreach methods.

Create a new instance of this class.

Parameters
  • cls – parent object class

  • args – passed to the super class __new__ method

  • kwargs – passed to the super class __new__ method

Raises

RuntimeError – if the cached and lazy versions were not defined before instantiating the class

Returns

an instance of this class

batch(size: int = 100) Iterator[cgnal.core.data.model.core.CachedIterableType]

Return an iterator of batches of size size.

Parameters

size – dimension of the batch

Yield

iterator of batches

cached_type: Type[cgnal.core.data.model.core.CachedIterableType]
filter(f: Callable[[cgnal.core.data.model.core.T], bool]) cgnal.core.data.model.core.LazyIterableType

Return an iterable where elements have been filtered based on a boolean function.

Parameters

f – boolean function that selects items

Returns

lazy iterable with elements filtered

foreach(f: Callable[[cgnal.core.data.model.core.T], Any])

Execute the provided function on each element of the iterable.

Parameters

f – function to be executed for each element

from_element(value: cgnal.core.data.model.core.T, cached=True) Union[cgnal.core.data.model.core.LazyIterableType, cgnal.core.data.model.core.CachedIterableType]

Instantiate a new object of this class from a single element.

Parameters
  • value – element

  • cached – whether a cached iterable should be returned, defaults to True

Returns

iterable object

lazy_type: Type[cgnal.core.data.model.core.LazyIterableType]
map(f: Callable[[cgnal.core.data.model.core.T], cgnal.core.typing.T_co]) cgnal.core.data.model.core.LazyIterableType

Map all elements of an iterable with the provided function.

Parameters

f – function to be used to map the elements

Returns

mapped iterable

take(size: int) cgnal.core.data.model.core.CachedIterableType

Take the first n elements of the iterables.

Parameters

size – number of elements to be taken

Returns

cached iterable with the first elements

to_cached() cgnal.core.data.model.core.CachedIterableType

Create a new cached instance of this instance.

Returns

cached iterable

to_lazy() cgnal.core.data.model.core.LazyIterableType

Create a new lazy instance of this instance.

Returns

lazy iterable

class cgnal.core.data.model.core.LazyIterable(items: cgnal.core.data.model.core.IterGenerator)

Bases: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.core.T]

Base class to be used for implementing lazy iterables.

Return an instance of the class to be used for implementing lazy iterables.

Parameters

items – IterGenerator containing the generator of items

property cached: bool

Whether the iterable is cached in memory or lazy.

Returns

boolean indicating whether iterable is fully-stored in memory

classmethod empty() cgnal.core.data.model.core.LazyIterableType

Return an empty lazy iterable.

Returns

Empty instance

classmethod from_iterable(iterable: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.core.T]) cgnal.core.data.model.core.LazyIterableType

Create a new instance of this class from a BaseIterable instance.

Parameters

iterable – iterable instance

Returns

lazy iterable

property items: Iterator[cgnal.core.data.model.core.T]

Return an iterator over the items.

Returns

Iterable[T]

class cgnal.core.data.model.core.PickleSerialization

Bases: cgnal.core.data.model.core.Serializable

Serialization based on pickle package.

classmethod load(filename: Union[str, os.PathLike[str]]) PickleSerialization

Load instance from pickle.

Parameters

filename – Name of the file to be read

Returns

Instance of the read Model

write(filename: Union[str, os.PathLike[str]]) None

Write instance as pickle.

Parameters

filename – Name of the file where to save the instance

class cgnal.core.data.model.core.Range(start: pandas.core.tools.datetimes.DatetimeScalar, end: pandas.core.tools.datetimes.DatetimeScalar)

Bases: cgnal.core.data.model.core.BaseRange

Base class for a continuous range.

Return a simple Range Class.

Parameters
  • start – starting datetime for the range

  • end – ending datetime for the range

Raises

ValueError – if start > end

property end: pandas._libs.tslibs.timestamps.Timestamp

Return the last timestamp.

Returns

Timestamp

overlaps(other: cgnal.core.data.model.core.BaseRange) bool

Return whether two ranges overlaps.

Parameters

other – other range to be compared with

Returns

True or False whether the two overlaps

range(freq='H') List[pandas._libs.tslibs.timestamps.Timestamp]

Return list of timestamps, spaced by given frequency.

Parameters

freq – given frequency

Returns

list of timestamps

property start: pandas._libs.tslibs.timestamps.Timestamp

Return the first timestamp.

Returns

Timestamp

class cgnal.core.data.model.core.RegisterLazyCachedIterables(class_object_first: Type[cgnal.core.data.model.core.IterableUtilsMixin], unidirectional_link: bool = False)

Bases: object

Register the lazy and cached version of the iterables.

Initialize an instance of this class.

Parameters
  • class_object_first – the first iterable class object (the lazy or chached version)

  • unidirectional_link – if True, only set the link in the second class passed to the __call__ method

static register_cached(class_object_lazy: Type[cgnal.core.data.model.core.IterableUtilsMixin], class_object_cached: Type[cgnal.core.data.model.core.IterableUtilsMixin])

Link the lazy and cached versions.

Parameters
  • class_object_lazy – the lazy iterable class object

  • class_object_cached – the chached iterable class object

static register_lazy(class_object_lazy: Type[cgnal.core.data.model.core.IterableUtilsMixin], class_object_cached: Type[cgnal.core.data.model.core.IterableUtilsMixin])

Link the lazy and cached versions.

Parameters
  • class_object_lazy – the lazy iterable class object

  • class_object_cached – the chached iterable class object

class cgnal.core.data.model.core.Serializable

Bases: abc.ABC

Abstract Class to be used to extend objects that can be serialised.

abstract classmethod load(filename: Union[str, os.PathLike[str]]) Serializable

Load class from a file.

Parameters

filename – filename

abstract write(filename: Union[str, os.PathLike[str]]) None

Write class to a file.

Parameters

filename – filename

cgnal.core.data.model.ml module

Module for specifying data-models to be used in modelling.

class cgnal.core.data.model.ml.CachedDataset(*args, **kwargs)

Bases: cgnal.core.data.model.ml.DatasetUtilsMixin[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType], cgnal.core.data.model.core.CachedIterable[cgnal.core.data.model.ml.Sample[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]], cgnal.core.data.model.core.DillSerialization

Class that represents dataset cached in-memory, derived by a cached iterables of samples.

Return instance of a class to be used for implementing cached iterables.

Parameters

items – sequence or iterable of elements

cached_type

alias of cgnal.core.data.model.ml.CachedDataset

lazy_type

alias of cgnal.core.data.model.ml.LazyDataset

to_df() pandas.core.frame.DataFrame

Reformat the Features and Labels as a DataFrame.

Returns

DataFrame, Dataframe with features and labels

class cgnal.core.data.model.ml.DatasetUtilsMixin(*args, **kwargs)

Bases: cgnal.core.data.model.core.IterableUtilsMixin[cgnal.core.data.model.ml.Sample[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType], LazyDataset[FeatType, LabType], CachedDataset[FeatType, LabType]], Generic[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType], abc.ABC

Base class for representing datasets as iterable over Samples.

Create a new instance of this class.

Parameters
  • cls – parent object class

  • args – passed to the super class __new__ method

  • kwargs – passed to the super class __new__ method

Raises

RuntimeError – if the cached and lazy versions were not defined before instantiating the class

Returns

an instance of this class

property asPandasDataset: cgnal.core.data.model.ml.PandasDataset

Cast object as a PandasDataset.

Returns

dataset

cached_type: Type[cgnal.core.data.model.core.CachedIterableType]
static checkNames(x: Optional[Union[int, str, Any]]) Union[str, int]

Check that feature names comply with format and cast them to either string or int.

Parameters

x – feature name

Returns

name as int or str

Raises

AttributeError – if x is none

getFeaturesAs(type: typing_extensions.Literal[array]) numpy.ndarray
getFeaturesAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
getFeaturesAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.FeatType]
getFeaturesAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.FeatType]
getFeaturesAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.FeatType]

Return object of the specified type containing the feature space.

Parameters

type – type of return. Can be one of “pandas”, “dict”, “list” or “array

Returns

an object of the specified type containing the features

Raises

ValueError – if the provided type is not one of the allowed ones

getLabelsAs(type: typing_extensions.Literal[array]) numpy.ndarray
getLabelsAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
getLabelsAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.LabType]
getLabelsAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.LabType]
getLabelsAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.LabType]

Return an object of the specified type containing the labels.

Parameters

type – type of return. Can be one of “pandas”, “dict”, “list” or “array

Returns

an object of the specified type containing the features

Raises

ValueError – if the provided type is not one of the allowed ones

lazy_type: Type[cgnal.core.data.model.core.LazyIterableType]
type()

Return the type of the objects in the Iterable.

Returns

type of the object of the iterable

union(other: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.ml.Sample[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]]) cgnal.core.data.model.ml.DatasetUtilsMixin[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]

Return a union of datasets.

Parameters

other – Dataset

Returns

LazyDataset

Raises

TypeError – other is not an instance of Dataset

class cgnal.core.data.model.ml.LazyDataset(*args, **kwargs)

Bases: cgnal.core.data.model.core.LazyIterable[cgnal.core.data.model.ml.Sample[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]], cgnal.core.data.model.ml.DatasetUtilsMixin[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]

Class that represents dataset derived by a lazy iterable of samples.

Return an instance of the class to be used for implementing lazy iterables.

Parameters

items – IterGenerator containing the generator of items

cached_type

alias of cgnal.core.data.model.ml.CachedDataset

features() Iterator[cgnal.core.data.model.ml.FeatType]

Return an iterator over sample features.

Returns

iterable of features

getFeaturesAs(type: typing_extensions.Literal[array]) numpy.ndarray
getFeaturesAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
getFeaturesAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.FeatType]
getFeaturesAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.FeatType]
getFeaturesAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.FeatType]

Return object of the specified type containing the feature space.

Parameters

type – type of return. Can be one of “pandas”, “dict”, “list” or “array

Returns

an object of the specified type containing the features

getLabelsAs(type: typing_extensions.Literal[array]) numpy.ndarray
getLabelsAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
getLabelsAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.LabType]
getLabelsAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.LabType]
getLabelsAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.LabType]

Return an object of the specified type containing the labels.

Parameters

type – type of return. Can be one of “pandas”, “dict”, “list”, “array” or iterators

Returns

an object of the specified type containing the features

labels() Iterator[cgnal.core.data.model.ml.LabType]

Return an iterator over sample labels.

Returns

iterable of labels

lazy_type

alias of cgnal.core.data.model.ml.LazyDataset

withLookback(lookback: int) cgnal.core.data.model.ml.LazyDataset

Create a LazyDataset with features that are an array of lookback lists of samples’ features.

Parameters

lookback – number of samples’ features to look at

Returns

LazyDataset with changed samples

class cgnal.core.data.model.ml.MultiFeatureSample(features: List[numpy.ndarray], label: Optional[cgnal.core.data.model.ml.LabType] = None, name: Optional[str] = None)

Bases: cgnal.core.data.model.ml.Sample[List[numpy.ndarray], cgnal.core.data.model.ml.LabType]

Class representing an observation defined by a nested list of arrays.

Object representing a single sample of a training or test set.

Parameters
  • features – features of the sample

  • label – labels of the sample (optional)

  • name – id of the sample (optional)

class cgnal.core.data.model.ml.PandasDataset(*args, **kwargs)

Bases: Generic[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType], cgnal.core.data.model.ml.DatasetUtilsMixin[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType], cgnal.core.data.model.core.DillSerialization

Dataset represented via pandas Dataframes for features and labels.

Return a datastructure built on top of pandas dataframes.

The PandasDataFrame allows to pack features and labels together and obtain features and labels as a pandas dataframe, numpy array or a dictionary. For unsupervised learning tasks the labels are left as None.

Parameters
  • features – a dataframe or a series of features

  • labels – a dataframe or a series of labels. None in case no labels are present.

Raises

TypeError – if the labels or features are not DataFrames nor Series

property cached: bool

Return whether the dataset is cached or not in memory.

Returns

boolean

cached_type

alias of cgnal.core.data.model.ml.PandasDataset

classmethod createObject(features: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], labels: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series]]) cgnal.core.data.model.ml.TPandasDataset

Create a PandasDataset object.

Parameters
  • features – features as pandas dataframe/series

  • labels – labels as pandas dataframe/series

Returns

a PandasDataset object

dropna(**kwargs) cgnal.core.data.model.ml.TPandasDataset

Drop NAs from feature and labels.

Parameters

kwargs – keyworded arguments are passed to dropna

Returns

PandasDataset with features and labels without NAs

classmethod empty() cgnal.core.data.model.ml.TPandasDataset

Return empty object.

Returns

Empty instance of class

property features: pandas.core.frame.DataFrame

Get features as pandas dataframe.

Returns

pd.DataFrame

classmethod from_sequence(datasets: Sequence[cgnal.core.data.model.ml.TPandasDataset]) cgnal.core.data.model.ml.TPandasDataset

Create a PandasDataset from a list of pandas datasets using pd.concat.

Parameters

datasets – list of PandasDatasets

Returns

PandasDataset

getFeaturesAs(type: typing_extensions.Literal[array]) numpy.ndarray
getFeaturesAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
getFeaturesAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.FeatType]
getFeaturesAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.FeatType]
getFeaturesAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.FeatType]

Get features as numpy array, pandas dataframe or dictionary.

Parameters

type – str, default is ‘array’, can be ‘array’,’pandas’,’dict’

Returns

features according to the given type

Raises

ValueError – provided type not allowed

getLabelsAs(type: typing_extensions.Literal[array]) numpy.ndarray
getLabelsAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
getLabelsAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.LabType]
getLabelsAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.LabType]
getLabelsAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.LabType]

Get labels as numpy array, pandas dataframe or dictionary.

Parameters

type – str, default is ‘array’, can be ‘array’,’pandas’,’dict’

Returns

labels according to the given type

Raises

ValueError – provided type not allowed

property index: pandas.core.indexes.base.Index

Get Dataset index.

Returns

pd.Index

intersection() cgnal.core.data.model.ml.TPandasDataset

Intersect feature and labels indices.

Returns

PandasDataset with features and labels with intersected indices

property items: Iterator[cgnal.core.data.model.ml.Sample]

Get features as an iterator of Samples.

Yield

Iterator of objects of cgnal.data.model.ml.Sample

property labels: pandas.core.frame.DataFrame

Get labels as a pandas dataframe.

Returns

pd.DataFrame

lazy_type

alias of cgnal.core.data.model.ml.LazyDataset

loc(idx: List[Any]) cgnal.core.data.model.ml.TPandasDataset

Find given indices in features and labels.

Parameters

idx – input indices

Returns

PandasDataset with features and labels filtered on input indices

takeAsPandas(n: int) cgnal.core.data.model.ml.TPandasDataset

Return top n records as a PandasDataset.

Parameters

n – int specifying number of records to output

Returns

PandasDataset of length n

union(other: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.ml.Sample[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]]) cgnal.core.data.model.ml.DatasetUtilsMixin[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]

Return a union between datasets.

Parameters

other – Dataset to be merged

Returns

Dataset resulting from the merge

class cgnal.core.data.model.ml.PandasTimeIndexedDataset(*args, **kwargs)

Bases: cgnal.core.data.model.ml.PandasDataset

Class to be used for datasets that have time-indexed samples.

Return a datastructure built on top of pandas dataframes that allows to pack features and labels that are time indexed.

Features and labels can be obtained as a pandas dataframe, numpy array or a dictionary. For unsupervised learning tasks the labels are left as None.

Parameters
  • features – pandas dataframe/series where index elements are dates in string format

  • labels – pandas dataframe/series where index elements are dates in string format

class cgnal.core.data.model.ml.Sample(features: cgnal.core.data.model.ml.FeatType, label: Optional[cgnal.core.data.model.ml.LabType] = None, name: Optional[Union[int, str, Any]] = None)

Bases: cgnal.core.data.model.core.DillSerialization, Generic[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]

Base class for representing a sample/observation.

Return an object representing a single sample of a training or test set.

Parameters
  • features – features of the sample

  • label – labels of the sample (optional)

  • name – id of the sample (optional)

cgnal.core.data.model.ml.features_and_labels_to_dataset(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], y: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series]] = None) cgnal.core.data.model.ml.CachedDataset

Pack features and labels into a CachedDataset.

Parameters
  • X – features which can be a pandas dataframe or a pandas series object

  • y – labels which can be a pandas dataframe or a pandas series object

Returns

an instance of cgnal.data.model.ml.CachedDataset

cgnal.core.data.model.text module

Module for providing abstraction and classes for handling NLP data.

class cgnal.core.data.model.text.CachedDocuments(*args, **kwargs)

Bases: cgnal.core.data.model.core.CachedIterable[cgnal.core.data.model.text.Document], cgnal.core.data.model.text.DocumentsUtilsMixin, cgnal.core.data.model.core.DillSerialization

Class representing a collection of documents cached in memory.

Return instance of a class to be used for implementing cached iterables.

Parameters

items – sequence or iterable of elements

cached_type

alias of cgnal.core.data.model.text.CachedDocuments

lazy_type

alias of cgnal.core.data.model.text.LazyDocuments

to_df(fields: Optional[List[str]] = None) pandas.core.frame.DataFrame

Represent the corpus of documents as a table by unpacking provided fields as columns.

Parameters

fields – Name of the document property to be unpacked as columns

Returns

dataframe representing the corpus with the given fields

class cgnal.core.data.model.text.Document(uuid: cgnal.core.data.model.text.K, data: Dict[str, Any])

Bases: Generic[cgnal.core.data.model.text.K]

Document representation as couple of uuid and dictionary of information.

Return instance of a document.

Parameters
  • uuid – document id

  • data – document data as a dictionary

addProperty(key: str, value: Any) cgnal.core.data.model.text.Document

Generate new Document instance with given new data element.

Parameters
  • key – key of the data element to add

  • value – value of the data element to add

Returns

Document with new given data element

property author: Optional[str]

Retrieve ‘author’ field.

Returns

author data field value

getOrThrow(key: str, default: Optional[Any] = None) Optional[Any]

Retrieve value associated to given key or return default value.

Parameters
  • key – key to retrieve

  • default – default value to return

Returns

retrieve element

Raises

KeyError – if key not found and default not provided

items() Iterator[Tuple[str, Any]]

Yield data items.

Yield

iterator with tuples of data properties names and values

property language: Optional[str]

Retrieve ‘language’ field.

Returns

language data field value

property properties: Iterator[str]

Yield data properties names.

Yield

iterator with data properties names

removeProperty(key: str) cgnal.core.data.model.text.Document

Generate new Document instance without given data element.

Parameters

key – key of data element to remove

Returns

Document without given data element

setRandomUUID() cgnal.core.data.model.text.Document

Generate new document instance with the same data as the current one but with random uuid.

Returns

Document instance with the same data as the current one but with random uuid

property text: Optional[str]

Retrieve ‘text’ field.

Returns

text data field value

class cgnal.core.data.model.text.DocumentsUtilsMixin(*args, **kwargs)

Bases: cgnal.core.data.model.core.IterableUtilsMixin[cgnal.core.data.model.text.Document, LazyDocuments, CachedDocuments]

Utilities for Documents iterables.

Create a new instance of this class.

Parameters
  • cls – parent object class

  • args – passed to the super class __new__ method

  • kwargs – passed to the super class __new__ method

Raises

RuntimeError – if the cached and lazy versions were not defined before instantiating the class

Returns

an instance of this class

cached_type: Type[cgnal.core.data.model.core.CachedIterableType]
lazy_type: Type[cgnal.core.data.model.core.LazyIterableType]
property type: Type[cgnal.core.data.model.text.Document]

Return the type of the objects in the Iterable.

Returns

Document class object

class cgnal.core.data.model.text.LazyDocuments(*args, **kwargs)

Bases: cgnal.core.data.model.core.LazyIterable[cgnal.core.data.model.text.Document], cgnal.core.data.model.text.DocumentsUtilsMixin

Class representing a collection of documents provided by a generator.

Return an instance of the class to be used for implementing lazy iterables.

Parameters

items – IterGenerator containing the generator of items

cached_type

alias of cgnal.core.data.model.text.CachedDocuments

lazy_type

alias of cgnal.core.data.model.text.LazyDocuments

cgnal.core.data.model.text.generate_random_uuid() bytes

Create a random number with 12 digits.

Returns

uuid

Module contents

Data model module.

In this module the data types used in CGnal framework are defined.