Chemical Descriptors Library: CDL Chemical Algebra
Client Algorithms

CDL Chemical Algebra

The CDL Chemical Algebra intends to provide:

Apart from facilitating the everyday life of a person that deals with chemical informatics, these tools can be used by software agents to custom-construct chemical entities with certain desired fragments, and, using similarity operators, certain desired properties.

Definition of the CDL Chemical Algebra

With a lot of simplification, one can define an algebra as a set of elements M with binary operators such as M x M --> M.
So, let's define the elements that constitutes the algebra, and the operators that work on these elements.

Set of the Algebra

The set of an algebra is the group of elements on which the operators work.
The first element of the CDL algebra is simply a plain CDL molecule. This molecule has one associated member variable (the molecular graph) and a variety of functions to access the vertex and edges and its properties. This molecule is made out of a smiles string.

As noted previously in the first example, operator + is a binary operator that accepts Left-Hand Side (LHS) and RHS operands. In the example, RHS is the fragment to be added. Hence, the LHS operand must contain information on where to add the desired fragment. For this reason, the second element of the algebra is an indexed molecule. The indexed molecule inherits from the plain CDL molecule (in programming terms, an indexed molecule IS A plain molecule) and has one own member variable which provides the mapping to a query molecule. The mapping is represented by the graph's vertex indexes given by the SMARTS algorithm. This molecule cannot be instantiated directly, since it's the result type of the application of a SMARTS algorithm to a plain molecule.

The third element of the algebra is the CDL's SMART class, which is constructed out of a SMARTS character string.
The SMARTS character set is a super-group of the smiles character set. Any valid smiles is a valid SMARTS, but not the other way around.


  1. Substructure Search Operator :

    The substructure search operator is a binary operator that applies the SMARTS algorithm of the RHS operand, to the molecule represented by the LHS operand. For the time being, this operator is represented by the symbol ^
    The return type of the operator is an indexed_molecule.

    operator ^ (plain molecule, smarts class) --> returns an indexed_molecule

    One can convert directly a smiles string to a plain molecule, and a smarts string to a smarts class. So, this is also valid:

    operator ^ (SMILES string, SMARTS string) --> returns an indexed_molecule

  2. Addition :

    The Addition operator inserts a fragment represented by the RHS operand, to the molecule represented by the LHS operand in the first position of the mapping to the query molecule. If there is no such mapping, then the fragment is not added. The Addition operator creates a bond between the LHS and the RHS, hence:

    Num. bonds resulting molecule = Num. bonds LHS + Num bonds. RHS + 1

    This operator is represented by the symbol +

    operator + (indexed_molecule, plain molecule) --> returns an indexed_molecule

    this is also valid:

    operator + (indexed_molecule, SMILES string) --> returns an indexed_molecule

    The returned molecule preserves the mapping contained in the LHS operand.

  3. Fusion :

    There are some occasions where the user wants to connect a fragment to more than one position on the target molecule. For example, ring addition.

    In the case of the addition operator, the position where the addition is to be performed is indicated by the first character of the SMARTS string.
    In the case of the fusion operator, we need two positions. To be able to do this, the SMARTS algorithm was extended to directly express positions in molecules. A position in a CDL SMARTS is given by the symbol <. The first position is always the first character of the SMARTS, the second position is defaulted or explicitly given by the < symbol. For example, a benzene ring, where the meta position is the second position of the map, will be expressed this way: c1cc<ccc1.

    The operator works on two indexed molecules :

    operator & (indexed molecule, indexed molecule) --> returns an indexed molecule

  4. Subtraction :

    The subtraction operator removes aliphatic (non cyclic) fragments from a molecule.
    The subtraction operator search for the RHS operand in the LHS operand. If only one match is found, the fragment is removed. If more than one match is found, then:

    The operator is defined as :

    operator - (indexed molecule, molecule) --> returns an indexed molecule

    operator - (indexed molecule, indexed molecule) --> returns an indexed molecule

    operator - (molecule, molecule) --> returns an indexed molecule

    The resulting indexed molecule has in the first position the vertex index where the removed fragment was attached to.


;)))) ....soon

Copyright © Vladimir Sykora & Cyprotex Ltd 2006 Logo