wc_kb.io


Reading and writing knowledge bases to/from files.

Supported file types:

  • Comma separated values (.csv)

  • Excel (.xlsx)

  • Tab separated values (.tsv)

Author:

Jonathan Karr <karr@mssm.edu>

Date:

2018-02-12

Copyright:

2018, Karr Lab

License:

MIT

3.2.5. Module Contents

3.2.5.1. Classes

Writer

Write knowledge base to file(s)

Reader

Read knowledge base from file(s)

3.2.5.2. Functions

convert(source_core, source_seq, dest_core, dest_seq)

Convert among Excel (.xlsx), comma separated (.csv), and tab separated (.tsv) file formats

create_template(core_path, seq_path[, taxon, ...])

Create file with knowledge base template, including row and column headings

3.2.5.3. Attributes

EUKARYOTE_MODELS

wc_kb.io.EUKARYOTE_MODELS = ()[source]
class wc_kb.io.Writer[source]

Bases: obj_tables.io.Writer

Write knowledge base to file(s)

run(core_path, knowledge_base, seq_path=None, rewrite_seq_path=True, taxon='eukaryote', models=None, get_related=True, include_all_attributes=False, validate=True, title=None, description=None, keywords=None, version=None, language=None, creator=None, write_schema=False, write_toc=True, extra_entries=0, data_repo_metadata=False, schema_package=None, protected=True)[source]

Write knowledge base to file(s)

Parameters:
  • knowledge_base (core.KnowledgeBase) – knowledge base

  • core_path (str) – path to save core knowledge base

  • seq_path (str, optional) – path to save genome sequence

  • rewrite_seq_path (bool, optional) – if True, the path to genome sequence in the saved knowledge base will be updated to the newly saved seq_path

  • taxon (str, optional) – type of model order to use

  • models (list of Model, optional) – models in the order that they should appear as worksheets; all models which are not in models will follow in alphabetical order

  • get_related (bool, optional) – if True, write object and all related objects

  • include_all_attributes (bool, optional) – if True, export all attributes including those not explictly included in Model.Meta.attribute_order

  • validate (bool, optional) – if True, validate the data

  • title (str, optional) – title

  • description (str, optional) – description

  • keywords (str, optional) – keywords

  • version (str, optional) – version

  • language (str, optional) – language

  • creator (str, optional) – creator

  • write_schema (bool, optional) – if True, include additional worksheet with schema

  • write_toc (bool, optional) – if True, include additional worksheet with table of contents

  • extra_entries (int, optional) – additional entries to display

  • data_repo_metadata (bool, optional) – if True, try to write metadata information about the file’s Git repo; the repo must be current with origin, except for the file

  • schema_package (str, optional) – the package which defines the obj_tables schema used by the file; if not None, try to write metadata information about the the schema’s Git repository: the repo must be current with origin

  • protected (bool, optional) – if True, protect the worksheet

Raises:

ValueError – if any of the relationships with knowledge bases and cells are not set

classmethod validate_implicit_relationships()[source]

Check that relationships to core.KnowledgeBase and core.Cell do not need to be explicitly written to workbooks because they can be inferred by Reader.run

Raises:

Exception – if the Excel serialization involves an unsupported implicit relationship

validate_implicit_relationships_are_set(knowledge_base)[source]

Check that there is only 1 KnowledgeBase and <= 1 Cell and that each relationship to KnowledgeBase and Cell is set. This is necessary to enable the KnowledgeBase and Cell relationships to be implicit in the Excel output and added by Reader.run

Parameters:

knowledge_base (core.KnowledgeBase) – knowledge base

Raises:

ValueError – if there are multiple instances of core.KnowledgeBase in the object graph

class wc_kb.io.Reader[source]

Bases: obj_tables.io.Reader

Read knowledge base from file(s)

run(core_path, seq_path='', rewrite_seq_path=True, taxon='eukaryote', models=None, ignore_missing_models=None, ignore_extra_models=None, ignore_sheet_order=None, include_all_attributes=False, ignore_missing_attributes=None, ignore_extra_attributes=None, ignore_attribute_order=None, group_objects_by_model=True, validate=True, read_metadata=False)[source]

Read knowledge base from file(s)

Parameters:
  • core_path (str) – path to core knowledge base

  • seq_path (str) – path to genome sequence

  • rewrite_seq_path (bool, optional) – if True, the path to genome sequence in the knowledge base will be updated to the provided seq_path

  • taxon (str, optional) – type of model order to use

  • models (types.TypeType or list of types.TypeType, optional) – type of object to read or list of types of objects to read

  • ignore_missing_models (bool, optional) – if False, report an error if a worksheet/ file is missing for one or more models

  • ignore_extra_models (bool, optional) – if True and all models are found, ignore other worksheets or files

  • ignore_sheet_order (bool, optional) – if True, do not require the sheets to be provided in the canonical order

  • include_all_attributes (bool, optional) – if True, export all attributes including those not explictly included in Model.Meta.attribute_order

  • ignore_missing_attributes (bool, optional) – if False, report an error if a worksheet/file doesn’t contain all of attributes in a model in models

  • ignore_extra_attributes (bool, optional) – if True, do not report errors if attributes in the data are not in the model

  • ignore_attribute_order (bool) – if True, do not require the attributes to be provided in the canonical order

  • group_objects_by_model (bool, optional) – if True, group decoded objects by their types

  • validate (bool, optional) – if True, validate the data

  • read_metadata (bool, optional) – if True, read metadata models

Returns:

model objects grouped by obj_tables.Model class

Return type:

dict

Raises:

ValueError

if core_path

  • Defines multiple knowledge bases or cells

  • Represents objects that cannot be linked to a knowledge base and/or cell

wc_kb.io.convert(source_core, source_seq, dest_core, dest_seq, taxon='eukaryote', rewrite_seq_path=True, protected=True)[source]

Convert among Excel (.xlsx), comma separated (.csv), and tab separated (.tsv) file formats

Read a knowledge base from the source files(s) and write it to the destination files(s). A path to a delimiter separated set of knowledge base files must be represented by a Unix glob pattern (with a *) that matches all delimiter separated files.

Parameters:
  • source_core (str) – path to the core of the source knowledge base

  • source_seq (str) – path to the genome sequence of the source knowledge base

  • dest_core (str) – path to save the converted core of the knowledge base

  • dest_seq (str) – path to save the converted genome sequence of the knowledge base

  • taxon (str) – taxon

  • rewrite_seq_path (bool, optional) – if True, the path to genome sequence in the converted core of the knowledge base will be updated to the path of the converted genome sequence

  • protected (bool, optional) – if True, protect the worksheet

wc_kb.io.create_template(core_path, seq_path, taxon='eukaryote', write_schema=False, write_toc=True, extra_entries=10, data_repo_metadata=True, protected=True)[source]

Create file with knowledge base template, including row and column headings

Parameters:
  • core_path (str) – path to save template of core knowledge base

  • seq_path (str) – path to save genome sequence

  • taxon (str, optional) – taxon

  • write_schema (bool, optional) – if True, include additional worksheet with schema

  • write_toc (bool, optional) – if True, include additional worksheet with table of contents

  • extra_entries (int, optional) – additional entries to display

  • data_repo_metadata (bool, optional) – if True, try to write metadata information about the file’s Git repo

  • protected (bool, optional) – if True, protect the worksheet