wc_kb.eukaryote


Schema to represent a knowledge base to build models of eukaryotes

Author:

Yin Hoon Chew <yinhoon.chew@mssm.edu>

Date:

2018-09-10

Copyright:

2018, Karr Lab

License:

MIT

3.2.4. Module Contents

3.2.4.1. Classes

ActivityLevel

Activity level of regulatory element

RegulationType

Type of regulation between a regulatory element and a gene

RegulatoryDirection

The direction of regulation

TranscriptType

Type of transcript

LocusAttribute

Start and end coordinates attribute

RegDirectionAttribute

Regulatory direction attribute

GeneLocus

Knowledge of a gene

TranscriptionFactorRegulation

Transcription factor and the direction of transcriptional regulation

RegulatoryModule

Knowledge about regulatory modules

PtmSite

Knowledge of protein modification sites

GenericLocus

Start and end coordinates of exons and CDSs

TranscriptSpeciesType

Knowledge of a transcript (spliced RNA) species

ProteinSpeciesType

Knowledge of a protein monomer

class wc_kb.eukaryote.ActivityLevel[source]

Bases: enum.Enum

Activity level of regulatory element

active = 1[source]
poised = 2[source]
repressed = 3[source]
inactive = 4[source]
na = 5[source]
class wc_kb.eukaryote.RegulationType[source]

Bases: enum.Enum

Type of regulation between a regulatory element and a gene

proximal = 1[source]
distal = 2[source]
class wc_kb.eukaryote.RegulatoryDirection[source]

Bases: int, enum.Enum

The direction of regulation

activation = 1[source]
repression[source]
unknown = 0[source]
class wc_kb.eukaryote.TranscriptType[source]

Bases: enum.Enum

Type of transcript

mRna = 1[source]
rRna = 2[source]
tRna = 3[source]
itRna = 4[source]
class wc_kb.eukaryote.LocusAttribute(related_name='', verbose_name='', verbose_related_name='', description='')[source]

Bases: obj_tables.ManyToManyAttribute

Start and end coordinates attribute

serialize(coordinates, encoded=None)[source]

Serialize related object :param coordinates: a list of instances of GenericLocus Python representation :type coordinates: list of Model :param encoded: dictionary of objects that have already been encoded :type encoded: dict, optional

Returns:

simple Python representation

Return type:

str

deserialize(value, objects, decoded=None)[source]

Deserialize value :param value: String representation :type value: str :param objects: dictionary of objects, grouped by model :type objects: dict :param decoded: dictionary of objects that have already been decoded :type decoded: dict, optional

Returns:

tuple of cleaned value

and cleaning error

Return type:

tuple of list of GenericLocus, InvalidAttribute or None

class wc_kb.eukaryote.RegDirectionAttribute(related_name='', verbose_name='', verbose_related_name='', description='')[source]

Bases: obj_tables.ManyToManyAttribute

Regulatory direction attribute

serialize(directions, encoded=None)[source]

Serialize related object :param directions: a list of instances of TFdirection Python representation :type directions: list of Model :param encoded: dictionary of objects that have already been encoded :type encoded: dict, optional

Returns:

simple Python representation

Return type:

str

deserialize(value, objects, decoded=None)[source]

Deserialize value :param value: String representation :type value: str :param objects: dictionary of objects, grouped by model :type objects: dict :param decoded: dictionary of objects that have already been decoded :type decoded: dict, optional

Returns:

tuple of cleaned value

and cleaning error

Return type:

tuple of list of RegDirection, InvalidAttribute or None

class wc_kb.eukaryote.GeneLocus[source]

Bases: wc_kb.core.PolymerLocus

Knowledge of a gene

symbol[source]

symbol

Type:

str

type[source]

type of gene

Type:

GeneType

Related attributes:

transcripts (list of TranscriptSpeciesType): transcripts regulatory_modules (list of RegulatoryModule): regulatory_modules

class Meta[source]

Bases: obj_tables.Model.Meta

verbose_name = 'Gene'[source]
verbose_name_plural = 'Genes'[source]
attribute_order = ('id', 'name', 'synonyms', 'symbol', 'homologs', 'polymer', 'strand', 'start', 'end',...[source]
symbol[source]
homologs[source]
class wc_kb.eukaryote.TranscriptionFactorRegulation[source]

Bases: obj_tables.Model

Transcription factor and the direction of transcriptional regulation

transcription_factor[source]

transcription factor

Type:

ProteinSpeciesType

direction[source]

regulatory direction

Type:

RegulatoryDirection

Related attributes:

regulatory_modules (list of RegulatoryModule): regulatory modules

class Meta[source]

Bases: obj_tables.Model.Meta

attribute_order = ('transcription_factor', 'direction')[source]
frozen_columns = 1[source]
table_format[source]
ordering = ('transcription_factor', 'direction')[source]
transcription_factor[source]
direction[source]
static _serialize(transcription_factor_id, direction_name)[source]

Generate string representation

Parameters:
  • transcription_factor_id (str) – transcription factor id

  • direction_name (str) – regulatory direction name

Returns:

value of primary attribute

Return type:

str

serialize()[source]

Generate string representation

Returns:

value of primary attribute

Return type:

str

classmethod deserialize(value, objects)[source]

Deserialize value :param value: String representation :type value: str :param objects: dictionary of objects, grouped by model :type objects: dict

Returns:

tuple of cleaned value

and cleaning error

Return type:

tuple of list of TranscriptionFactorRegulation, InvalidAttribute or None

class wc_kb.eukaryote.RegulatoryModule[source]

Bases: obj_tables.Model

Knowledge about regulatory modules

id[source]

identifier

Type:

str

name[source]

name

Type:

str

gene[source]

gene

Type:

GeneLocus

promoter[source]

promoter ensembl ID

Type:

str

activity[source]

cell-type specific activity level

Type:

ActivityLevel

type[source]

type of regulation (proximal or distal)

Type:

RegulationType

transcription_factor_regulation[source]

transcription factor and direction of regulation

Type:

TranscriptionFactorRegulation

comments[source]

comments

Type:

str

references[source]

references

Type:

list of Reference

identifiers[source]

identifiers

Type:

list of Identifier

class Meta[source]

Bases: obj_tables.Model.Meta

attribute_order = ('id', 'name', 'gene', 'promoter', 'activity', 'type', 'transcription_factor_regulation',...[source]
id[source]
name[source]
gene[source]
promoter[source]
activity[source]
type[source]
transcription_factor_regulation[source]
comments[source]
references[source]
identifiers[source]
class wc_kb.eukaryote.PtmSite[source]

Bases: wc_kb.core.PolymerLocus

Knowledge of protein modification sites

modified_protein[source]

modified protein

Type:

ProteinSpeciesType

type[source]

type of modification (phosphorylation, methylation, etc…)

Type:

str

modified_residue[source]

residue name and position in protein sequence

Type:

str

fractional_abundance[source]

ratio of modified protein abundance

Type:

int

class Meta[source]

Bases: obj_tables.Model.Meta

attribute_order = ('id', 'name', 'modified_protein', 'type', 'modified_residue', 'fractional_abundance',...[source]
type[source]
modified_protein[source]
modified_residue[source]
fractional_abundance[source]
class wc_kb.eukaryote.GenericLocus[source]

Bases: obj_tables.Model

Start and end coordinates of exons and CDSs

start[source]

start coordinate

Type:

int

end[source]

end coordinate

Type:

int

Related attributes:

transcripts (list of TranscriptSpeciesType): transcripts proteins (list of ProteinSpeciesType): proteins

class Meta[source]

Bases: obj_tables.Model.Meta

attribute_order = ('start', 'end')[source]
table_format[source]
ordering = ('start', 'end')[source]
start[source]
end[source]
static _serialize(start, end)[source]

Generate string representation

Parameters:
  • start (int) – start coordinate

  • end (int) – end coordinate

Returns:

value of primary attribute

Return type:

str

serialize()[source]

Generate string representation

Returns:

value of primary attribute

Return type:

str

class wc_kb.eukaryote.TranscriptSpeciesType[source]

Bases: wc_kb.core.PolymerSpeciesType

Knowledge of a transcript (spliced RNA) species

gene[source]

gene

Type:

GeneLocus

exons[source]

exon coordinates

Type:

list of LocusAttribute

type[source]

type

Type:

TranscriptType

Related attributes:

protein (ProteinSpeciesType): protein

class Meta[source]

Bases: obj_tables.Model.Meta

verbose_name = 'Transcript'[source]
verbose_name_plural = 'Transcripts'[source]
attribute_order = ('id', 'name', 'gene', 'exons', 'type', 'identifiers', 'references', 'comments')[source]
gene[source]
exons[source]
type[source]
get_seq()[source]

Get the 5’ to 3’ sequence

Returns:

sequence

Return type:

Bio.Seq.Seq

get_empirical_formula(seq_input=None)[source]

Get the empirical formula for a transcript (spliced RNA) species with

  • 5’ monophosphate

  • Deprotonated phosphate oxygens

\(N_A * AMP + N_C * CMP + N_G * GMP + N_U * UMP - (L-1) * OH\)

Parameters:

seq_input (Bio.Seq.Seq, optional) – if provided, the method will use it instead of reading from fasta file to reduce IO operation

Returns:

empirical formula

Return type:

chem.EmpiricalFormula

get_charge(seq_input=None)[source]

Get the charge for a transcript (spliced RNA) species with

  • 5’ monophosphate

  • Deprotonated phosphate oxygens

\(-L - 1\)

Parameters:

seq_input (Bio.Seq.Seq, optional) – if provided, the method will use it instead of reading from fasta file to reduce IO operation

Returns:

charge

Return type:

int

get_mol_wt(seq_input=None)[source]

Get the molecular weight for a transcript (spliced RNA) species with

  • 5’ monophosphate

  • Deprotonated phosphate oxygens

Parameters:

seq_input (Bio.Seq.Seq, optional) – if provided, the method will use it instead of reading from fasta file to reduce IO operation

Returns:

molecular weight (Da)

Return type:

float

class wc_kb.eukaryote.ProteinSpeciesType[source]

Bases: wc_kb.core.PolymerSpeciesType

Knowledge of a protein monomer

uniprot[source]

uniprot id

Type:

str

transcript[source]

transcript

Type:

TranscriptSpeciesType

coding_regions[source]

CDS coordinates

Type:

list of LocusAttribute

Related attributes:

transcription_factor_regulation (list of TranscriptionFactorRegulation): transcription factor regulation ptm_sites (:obj:list` of PtmSite): protein modification sites

class Meta[source]

Bases: obj_tables.Model.Meta

verbose_name = 'Protein'[source]
verbose_name_plural = 'Proteins'[source]
attribute_order = ('id', 'name', 'uniprot', 'transcript', 'coding_regions', 'identifiers', 'references', 'comments')[source]
uniprot[source]
transcript[source]
coding_regions[source]
get_seq(table=1, cds=True)[source]

Get the 5’ to 3’ sequence

Parameters:
  • table (int, optional) – NCBI identifier for translation table (default = standard table)

  • cds (bool, optional) – True indicates the sequence is a complete CDS

Returns:

sequence

Return type:

Bio.Seq.Seq

get_seq_and_start_codon(table=1, cds=True)[source]

Get the 5’ to 3’ amino acid sequence and the start codon

Parameters:
  • table (int, optional) – NCBI identifier for translation table (default = standard table)

  • cds (bool, optional) – True indicates the sequence is a complete CDS

Returns:

coding RNA sequence that will be translated Bio.Seq.Seq: amino acid sequence Bio.Seq.Seq: start codon

Return type:

Bio.Seq.Seq

get_empirical_formula(table=1, cds=True, seq_input=None)[source]

Get the empirical formula

Parameters:
  • table (int, optional) – NCBI identifier for translation table (default = standard table)

  • cds (bool, optional) – True indicates the sequence is a complete CDS

  • seq_input (Bio.Seq.Seq, optional) – if provided, the method will use it instead of reading from fasta file to reduce IO operation

Returns:

empirical formula

Return type:

chem.EmpiricalFormula

get_charge(table=1, cds=True, seq_input=None)[source]

Get the charge at physiological pH

Parameters:
  • table (int, optional) – NCBI identifier for translation table (default = standard table)

  • cds (bool, optional) – True indicates the sequence is a complete CDS

  • seq_input (Bio.Seq.Seq, optional) – if provided, the method will use it instead of reading from fasta file to reduce IO operation

Returns:

charge

Return type:

int

get_mol_wt(table=1, cds=True, seq_input=None)[source]

Get the molecular weight

Parameters:
  • table (int, optional) – NCBI identifier for translation table (default = standard table)

  • cds (bool, optional) – True indicates the sequence is a complete CDS

  • seq_input (Bio.Seq.Seq, optional) – if provided, the method will use it instead of reading from fasta file to reduce IO operation

Returns:

molecular weight

Return type:

float