Chemical Descriptors Library: SMARTS
Client Algorithms

SMARTS language

CDL provides a language simililar to Daylight SMARTS for substructure searches. Please refer to this link to get an introduction of the language.

The atomic primitives provided are:

SymbolDescriptionDefault
* any atom no default
a aromatic atom no default
A aliphatic atom no default
D<n> n number of connections 1
H<n> n total number of Hydrogens attached 1
h<n> n implicit number of Hydrogens attached 1
R<n> the atoms belongs to n SSSR rings any ring atom
r<n> the atoms belongs to a SSSR ring of size n any ring atom
X<n> n number of total connections 1
x<n> n total ring connections at least 1
-<n> -n charge -1 (-- is -2)
+<n> +n charge +1 (++ is +2)
#<n> atomic number of n no default

Additional CDL atomic primitives are

SymbolDescriptionDefault
j<n> Vertex index equals n no defaults. Must provide a value. This primitive is handy when searching for subgroups from a certain distance of a known atom: i.e. [j15]~*~*~*~[$([NX3+](=O)[O-])] searches a Nitro group 4 bonds apart from the vertex of index 15
< repositions the index of the second smarts graph vertex to the one of the LHS of symbol <
This symbol is actually not an atomic primitive, hence it should be included in the main branch of a smarts (it should not be included in brackets).
no arguments required. Please refer to the fusion operator of the CDL chemical algebra

The bond primitives provided :

SymbolDescription
- single bond
= double bond
# triple bond
: aromatic bond
~ any bond
@ any ring bond

Logical operators:

SymbolOperatorDescription
, or Lazy evaluator of LHS or RHS
! not negates the subsequent rule
& and Lazy operator. Checks first LHS
; and Lazy operator. Checks first RHS

Prototype

The SMARTS is implemented in CDL as a class which receives the smarts string in the constructor, and provides operator() which accepts the molecule to search in.
  template <class Molecule>
  struct smarts {
    typedef Molecule                                               molecule_t;

    smarts(const std::string& smart_str);

    bool operator()(const molecule_t& external_mol);
    
    bool empty();

    const mapped_cont_t& get_mapped_vertices() const;
    
    size_t smarts_size() const;
    
  };

mapped_cont_t is of type std::vector<std::vector<std::pair<vertex_descriptor,vertex_descriptor> > > which provides the same mapping as explained in the indexed_molecule.

For component grouping use the following class :

  template <class Molecule>
  struct component_smarts {
  
    typedef Molecule    molecule_t;
  
    component_smarts(const std::string& smart_str);
      
    bool operator()(const molecule_t& external_mol);
    
  private :
    std::vector<std::string>        m_component_smarts;
    
  };

This implementation of componen grouping accepts groups enclosed by parenthesis. If an initial zero-level parenthesis is present, then the following groups must be enclosed by parenthesis: i.e. "(" must follow a "."

Definition

#include <morpho/cdl/smarts/smarts.hpp>

Complexity

Depends on the number of recursions present in the SMART. Its an application of the Ullmann algorithm once, and once for each recursion present.

Copyright © Vladimir Sykora & Cyprotex Ltd 2006
SourceForge.net Logo