Chemical Descriptors Library: Atom Types Fingerprints
Client Algorithms

Atom Types Fingerprints

The atom types fingerprints method is similar to the topological pharmacophores method. The differences are the atom types considered for each method, and that in the Atom Types Fingerprints you need an atom where to base your fingerprint.
This method counts the number of considered atomic types that surrounds an atom of interest, up to a maximum distance.
The method is also similar to an atomic RDF function, with the difference that this method counts atom types, and insert the sum of the counts in a pre-defined position of the fingerprint. The positions of the fingerprints reflects the distance of the atom type count over the shortest path to the considered atom of interest.
The method is described in the Xing and Glen 2002 paper.

The considered Atom Types are:

Atom Type
sp3 Carbon
sp2 Carbon
sp Carbon
aromatic Carbon
Carbon cation
sp3 Nitrogen
sp2 Nitrogen
sp Nitrogen
aromatic Nitrogen
amide Nitrogen
sp3 Nitrogen positively charged
sp3 Oxygen
sp2 Oxygen
Oxygen in carboxylic and phosphoric acid
sp3 Sulfur
sp2 Sulfur
Sulfoxide Sulfur
Sulfone Sulfur
Hydrogen
Fluorine
Chlorine
Bromine
Iodine

Prototype

The method is implemented as a functor, accepting the molecule to its constructor, and the vertex index number to consider to operator().
The fingerprint is stores as a std::vector<size_t>. You get it with the function get_fingerprint().

  template <class Molecule>
  struct pka_fingerprint {
    pka_fingerprint(const molecule_t& m, size_t max_length = 5);
    
    bool operator()(size_t  v);
    
    const std::vector<size_t>&
    get_fingerprint();
    
  };

Definition

// for the atom types :
#include <morpho/cdl/fingerprints/sybyl_typing.hpp>
// for the pka_fingerprint functor :
#include <morpho/cdl/fingerprints/pka_fp.hpp>

Preconditions

The distance matrix is already assigned and can be accessed through the method : MolProperty<distance_matrixS,Molecule>::get();

Complexity

Assuming the distance matrix is already assigned (as done by the default cosntructor of the molecule), the complexity is linear applications of the atom type predicate.

Example

To determine the atom type fingerprint of the nitrogen atoms of each molecule given to standard input. The molecules are in sdf format :
int main() {
  typedef molecule<>    M;
  typedef M::atom_type   atype;
  typedef AtomProperty<atomic_numberS,atype>::result_type     ANum;

  ANum  N(7);
  
  std::basic_istream<char, std::char_traits<char> >::pos_type  pos;
  while(cin.good()) {
    pos=std::cin.tellg();
    if ( !morpho::look_ahead(std::cin,4)) break;
    std::cin.seekg(pos);

    nail_juice<>  j;
    std::map<std::string, std::vector<std::string> >  n_props;
    get_juice_from_stream(std::cin, j, 0, sdf_formatT(),&n_props);
    M    m(j,true);

    M::vertex_iterator vi,vi_end;
    tie(vi,vi_end) = m.vertices();
    while(vi!=vi_end) {
      if( AtomProperty<atomic_numberS,atype>::get(m.get_atom(*vi))==N ) {
        pka_fingerprint<M>    FP(m);
        FP(*vi);
        std::copy(FP.get_fingerprint().begin(),FP.get_fingerprint().end(),std::ostream_iterator<size_t>(cout," "));
        std::cout << std::endl;
        break;
      }
      ++vi;
    }
 
  }

 return 0;
}

References

Xing, Glen. "Novel Methods for the Prediction of LopP, pKa, and LogD". J. Chem. Inf. Comput. Sci. 2002. Vol 42. pp. 796-805.

Copyright © Vladimir Sykora & Cyprotex Ltd 2006