cgnal.core.data.layer.pandas package

Submodules

cgnal.core.data.layer.pandas.archivers module

Module with abstraction for accessing to data persistent in pickles, mimicking a ficticious database.

class cgnal.core.data.layer.pandas.archivers.CsvArchiver(filename: Union[str, os.PathLike[str]], dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series], sep: str = ';')

Bases: cgnal.core.data.layer.pandas.archivers.PandasArchiver[cgnal.core.typing.T]

Archiver based on persistent layers based on tabular files stored on csv.

Create an in-memory archiver based on structured data stored in the filesystem as a CSV.

Parameters
  • filename – str, path object or file like object. Any valid string path to a csv file.

  • dao – An instance of cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO that helps to retrieve/archive a pd.Dataframe row.

  • sep – str, default ‘;’. Delimiter to use

Raises

TypeError – if filename is not a string

class cgnal.core.data.layer.pandas.archivers.PandasArchiver(dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series])

Bases: cgnal.core.data.layer.Archiver[cgnal.core.typing.T], abc.ABC

Archiver based on persistent layers based on tabular files, represented in memory by a pandas DataFrame.

Create an in-memory archiver based on structured data stored as a pandas DataFrame.

Parameters

dao – An instance of cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO that helps to retrieve/archive a pd.DataFrame row

Raises

TypeError – if given DAO is not of the correct type

archive(objs: Union[Iterable[cgnal.core.typing.T], cgnal.core.typing.T])

Insert one or more objects in the underlying pd.DataFrame object.

Parameters

objs – object or list of objects to be archived. The objects can be of the cgnal.data.model.text.Document, pd.DataFrame or pd.Series

Returns

self i.e. an instance of PandasArchiver with updated self.data object

archiveMany(objs: Iterable[cgnal.core.typing.T]) cgnal.core.data.layer.pandas.archivers.PandasArchiver

Insert many objects of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.

Parameters

objs – List of objects to be inserted. The objects can be of the following class instances cgnal.data.model.text.Document, pd.DataFrame or pd.Series

Returns

self i.e. an instance of PandasArchiver with updated self.data object

archiveOne(obj: cgnal.core.typing.T)

Insert an object of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.

Parameters

obj – An instance of cgnal.data.model.text.Document, pd.DataFrame or pd.Series

Returns

self i.e. an instance of PandasArchiver with updated self.data object

commit() cgnal.core.data.layer.pandas.archivers.PandasArchiver

Persist data stored in memory in the file.

Returns

self

property data: pandas.core.frame.DataFrame

Return tabular data stored in memory.

Returns

pd.DataFrame

retrieve(condition: Optional[Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]] = None, sort_by: Optional[Union[str, List[str]]] = None) Iterator[cgnal.core.typing.T]

Retrieve rows satisfying condition, sorted according to given ordering.

Parameters
  • condition – condition to satisfy. If None return all rows.

  • sort_by – ordering to respect. If None, no ordering is given.

Returns

iterator of (ordered) rows satisfying given condition

retrieveById(uuid: pandas.core.indexes.base.Index) cgnal.core.typing.T

Retrive row from a dataframe by id.

Parameters

uuid – row id

Returns

retrieved row parsed according to self.dao

retrieveGenerator(condition: Optional[Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]] = None, sort_by: Optional[Union[str, List[str]]] = None) cgnal.core.data.model.core.IterGenerator[cgnal.core.typing.T]

Retrieve a generator of rows satisfying condition, sorted according to given ordering.

Parameters
  • condition – condition to satisfy. If None return all rows.

  • sort_by – ordering to respect. If None, no ordering is given.

Returns

generator of rows satisfying given condition. The generator is of the the type cgnal.data.model.core.IterGenerator (ordered)

class cgnal.core.data.layer.pandas.archivers.PickleArchiver(filename: Union[str, os.PathLike[str]], dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series])

Bases: cgnal.core.data.layer.pandas.archivers.PandasArchiver[cgnal.core.typing.T]

Archiver based on persistent layers based on tabular files stored on a pickle.

Create an in-memory archiver based on structured data stored in the filesystem as a Pickle.

Parameters
  • filename – str, path object or file like object. Any valid string path to a pickle file.

  • dao – An instance of cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO that helps to retrieve/archive a pd.Dataframe row.

Raises

TypeError – if filename is not a string

class cgnal.core.data.layer.pandas.archivers.TableArchiver(table: cgnal.core.data.layer.pandas.databases.Table, dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series])

Bases: cgnal.core.data.layer.pandas.archivers.PandasArchiver[cgnal.core.typing.T]

Archiver based on persistent layers based on tabular files stored on a table in a database.

Create an in-memory archiver based on structured data stored as a table.

Parameters
  • table – An instance of cgnal.data.layer.pandas.databases.Table

  • dao – An instance of cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO that helps to retrieve/archive a pd.Dataframe row.

cgnal.core.data.layer.pandas.dao module

Module with the implementation and abstraction for serializing/deserializing objects into DataFrames.

class cgnal.core.data.layer.pandas.dao.DataFrameDAO(*args, **kwds)

Bases: cgnal.core.data.layer.DAO[pandas.core.frame.DataFrame, pandas.core.series.Series]

Data Access Object for pd.DataFrames.

static addName(df: pandas.core.frame.DataFrame, name: Optional[Hashable]) pandas.core.frame.DataFrame

Add name to the input dataframe.

Parameters
  • df – pd.DataFrame

  • name – str

Returns

pd.DataFrame

computeKey(df: pandas.core.frame.DataFrame) Hashable

Get dataframe name.

Parameters

df – pd.DataFrame. A pandas dataframe

Returns

str, name of the dataframe

get(df: pandas.core.frame.DataFrame) pandas.core.series.Series

Get dataframe as pd.Series.

Parameters

df – pd.DataFrame. A pandas dataframe

Returns

pd.Series

parse(row: pandas.core.series.Series) pandas.core.frame.DataFrame

Get a row i.e. pd.Series as a pandas DataFrame.

Parameters

row – pd.Series, row of a pd.DataFrame

Returns

pd.DataFrame, a pandas dataframe object

class cgnal.core.data.layer.pandas.dao.DocumentDAO(*args, **kwds)

Bases: cgnal.core.data.layer.DAO[cgnal.core.data.model.text.Document, pandas.core.series.Series]

Data access object for documents.

computeKey(doc: cgnal.core.data.model.text.Document) Hashable

Get document id.

Parameters

doc – an instance of cgnal.data.model.text.Document

Returns

uuid i.e. id of the given document

get(doc: cgnal.core.data.model.text.Document) pandas.core.series.Series

Get doc as pd.Series with uuid as name.

Parameters

doc – an instance of cgnal.data.model.text.Document

Returns

pd.Series

parse(row: pandas.core.series.Series) cgnal.core.data.model.text.Document

Get a row i.e. pd.Series as a Document.

Parameters

row – pd.Series, row of a pd.DataFrame

Returns

cgnal.data.model.text.Document, a Document object

class cgnal.core.data.layer.pandas.dao.SeriesDAO(mapping: Optional[Dict[Hashable, Hashable]] = None, keys: Optional[Sequence[Hashable]] = None)

Bases: cgnal.core.data.layer.DAO[pandas.core.series.Series, pandas.core.series.Series]

Data Access Object for pd.Series.

Create a Data Access Object for pd.Series.

Parameters
  • mapping – mapping of names between the file and the Series. The value of the key represent the name in the file, the value represent the name we want to have in the Series

  • keys – which of the fields in the Series should be used as keys

computeKey(serie) Union[Hashable, Dict]

Get series name.

Parameters

serie – pd.Series

Returns

dict representing the key

Raises

ValueError – keys and series name have different dimensions

get(serie: pandas.core.series.Series) pandas.core.series.Series

Get a series as series object.

Parameters

serie – pd.Series

Returns

pd.Series

property inverseMapping: Dict[Hashable, Hashable]

Return the inverse mapping between field names.

Returns

dict with inverse mapping

Raises

ValueError – mapping is not invertible because of duplicated values

parse(row: pandas.core.series.Series) pandas.core.series.Series

Get a row as a pd.Series object.

Parameters

row – pd.Series

Returns

pd.Series

cgnal.core.data.layer.pandas.databases module

Module with abstraction for databases and tables.

class cgnal.core.data.layer.pandas.databases.Database(name: Union[str, os.PathLike[str]], extension: str = '.p')

Bases: cgnal.core.logging.defaults.WithLogging, cgnal.core.data.layer.DatabaseABC

Class representing a Database object.

Return an instance of a class implementing standard read and write methods to pickle data sources.

Parameters
  • name – path to pickles

  • extension – standard pickle extension

table(table_name: str) cgnal.core.data.layer.pandas.databases.Table

Select table.

Parameters

table_name – name of the table

Returns

object of class PickleTable

property tables: List[str]

Complete pickle names with appropriate extension.

Returns

pickle names with appropriate extensions

class cgnal.core.data.layer.pandas.databases.Table(db: cgnal.core.data.layer.pandas.databases.Database, table_name: str)

Bases: cgnal.core.logging.defaults.WithLogging, cgnal.core.data.layer.TableABC

Class representing a Table in a Database.

Implement a constructor for tables using pickle file format.

Parameters
  • db – database to which the table belongs

  • table_name – name of the table

Raises

TypeError – if the provided database is not an instance of Database

property data: pandas.core.frame.DataFrame

Read pickle.

Returns

pd.DataFrame or pd.Series read from pickle

property filename: Union[str, os.PathLike[str]]

Return path to pickle.

Returns

path to pickle file

to_df(query: Optional[str] = None) pandas.core.frame.DataFrame

Read pickle.

Parameters

query – query

Returns

pd.DataFrame or pd.Series read from pickle

write(df: pandas.core.frame.DataFrame, overwrite: bool = False) None

Write pickle of data, eventually outer joined with an input DataFrame.

Parameters
  • df – input data

  • overwrite – whether or not to overwrite existing file

Module contents

Data layer for pandas integration.