A Quick Tour of the Chemical Descriptors Library |
CDL main component is the molecular class. It is designed in a way that
decouples graph structure from the atomic and bond properties; that is,
the graph representation and the molecular properties are on different
layers of abstraction: they aren't coupled. This provides large flexibility
since the user is not required to have a particular atomic and bond properties
class with specific interfaces.
The basic component of the molecular class is a graph class, represented
as an adjacency list, and is contained as a member. This graph in turn has
properties attached to its vertices and edges. Vertex properties represent
"atoms" and edge properties "bonds". There's also some properties which
respect specifically to the molecule as a whole; these are called
"molecule properties".
CDL provides interfaces to navigate over the graph representation: vertex,
edge, and other iterators. Since many of the algorithms you'd like to write
require bonds and atoms properties, these interfaces are of no great interest.
What's of outmost importance are the interfaces to navigate over the
properties directly.
Iterators that navigate over atom and bond properties are nothing else that
adaptors from the graph's vertex and edge iterators.
You can convert an iterator from the graph representation to an iterator that navigates their underlying properties, and the other way around. What you can not do is convert an atom property class (structure you get when you derreference and atom iterator) to a vertex; the same is true for the bond and edge. This is a limitation intrinsic of the type of abstraction, but which you won't ever care about because it's possible to fully navigate the graph properties providing an atom or bond.
As the molecular properties (atom, bond and molecule properties) are parameterized, there is the need to access the members of this properties in a generic way. The property selectors were written to address this requirenment. They provide a generic interface between CDL's algorithms and the molecular properties.
There is one property accessor per property of the molecule: three in total:
AtomProperty<Selector,Atom>, BondProperty<Selector,Bond>,
and MolProperty<Selector,Molecule>.
The Selector parameter is a tag type to select the
desired property. Partial specializations of the property accessors (via the
Selector parameter) provides the mechanism to access the property
value and type. All property accessors inherit from std::unary_function<>,
in this way, argument and result type are exposed. The only requirenment that the
CDL places on the accessors, is the name of the accessor itself, and the name
of the selectors. Also they must provide the get() and set() methods.
There are some properties of the atom that never change once the atom is specified (as the
atomic number, or symbol); this properties do not have the requirement to provide
a set() method. Hence, the AtomProperties parameter for the molecule<> class
provides two classes of properties: static (readeable) and read-write. I'll
come back on this later.
For example, atomic number is a basic property of the atom. When the user provides
the AtomProperties parameter for the molecule<> class, CDL
assumes that AtomProperties parameter has a property of arbitrary type which represents the
atomic number. There are no requirements that CDL places on this property, other
than that it must be readable. The details of how the user provides this property
lies on the property accessor: for this reason, property accessors are user-defined.
When CDL requires this property, it uses the accessor with the specific Selector
to access it. For the atomic number case, the Selector is atomic_numberS.
CDL assumes that there is an AtomProperty<atomic_numberS,Atom> specialization,
and that it can be accessed via the get() method. Also, the type of this
property can be accessed via AtomProperty<atomic_numberS,Atom>::result_type.
Assume that you want to write an algorithm to calculate the molweight of
your drug. In a generic programming paradigm (the conceptual fundation of the CDL)
you just need to put your hands on two iterators : they point to some arbitrary
object that somehow knows the value of some atomic mass. This arbitrary object
is the AtomProperties parameter that you provided for the molecule<>
class. Upon construction of the molecule, you need to specify which "atom" each vertex
of the molecular graph is; in this way CDL constructs a conceptual molecular graph:
it is a graph at the end, but with properties attached to each vertex and edge. Also,
CDL knows which "physical" atom each vertex is. In this way:
template <class AtomIterator> typename AtomProperty<atomic_massS, typename AtomIterator::value_type>::result_type mol_weight(AtomIterator first, AtomIterator last) { typedef typename AtomProperty<atomic_massS, typename AtomIterator::value_type>::result_type atomic_mass_type; atomic_mass_type rvo=atomic_mass_type(); while(first!=last) { rvo+= AtomProperty<atomic_massS, typename AtomIterator::value_type>::get(*first++); } return rvo; }To be even more generic, we could write an algorithm to work on all properties, providing a property Iterator, and a UnaryOperation, in this way:
template <class InputIterator, class UnaryOperation> typename UnaryOperation::result_type accumulate_prop(InputIterator first, InputIterator last, UnaryOperation op) { typedef typename UnaryOperation::result_type prop_type; prop_type rvo=prop_type(); while (first!=last) { rvo+= op(*first++); } return rvo; }With this algorithm, we can use whichever property accessors, being for atoms or bonds. The only hack we have to perform is to convert the member function get() of the property accessor, to a functor. We can do this using std::mem_fun(). I'll come back to this function later.
There are many formats for molecular representations that are commonly used:
smiles, mol, mol2, etc. CDL provides a structure that encapsulates all the
information these formats carry: the 'nail_juice'. Once you have filled the
nail_juice with information about the molecule, you can construct a CDL's
molecule class.
The information nail_juice provide is:
With this information you can construct a CDL's molecule.
The following example illustrates how to use the tags to produce different molecular formats. You can find it in examples/streaming_mols.cpp
// --- std inclusions : #include <fstream> // --- morpho inclusions : #include <morpho/cdl/molecule/molecule.hpp> #include <morpho/cdl/parsers/parsers.cpp> using namespace morpho::cdl; int main() { // Suppose that we want to stream-in a molecule on sdf // format from the standard input, and output it as // smile to a file. Also, we want to get a molecule // on smiles format from a file named "bioiso.3d.smi" // and output it on sdf format to the standard output. // lets process all the stream : while( !std::cin.eof() ) { // first check that the stream has a valid sdf : std::basic_istream<char, std::char_traits<char> >::pos_type pos; pos=std::cin.tellg(); if ( !morpho::look_ahead(std::cin,4)) break; std::cin.seekg(pos); // So, lets get the first molecule from standard input: // We need to get the information contained in the // character string format (this case sdf) using the // juice. Declare a juice variable: nail_juice<> j_from_sdf; // Now use get_juice_from_stream() function to get the // molecule from a stream : // The first argument is the stream, the second the // juice, the third a counting variable to track error // structures, and the last argument we specify the // format we want: get_juice_from_stream(std::cin, j_from_sdf, 0, sdf_formatT()); // Now we construct the molecule: molecule<> mol_from_sdf(j_from_sdf, false, true); // Now lets output it to a file : // First we declare the stream : std::ofstream out_stream("structures.smi"); // Now stream it out as a smile : out_stream << write_nail( mol_from_sdf , smiles_formatT() ) << std::endl; } // while() // Now get a smile from a file named "bioiso.3d.smi" and // output it as sdf to standard output : // Declare the input stream : std::ifstream in_stream("bioiso.3d.smi"); // check if its good : if(!in_stream.good()) { std::cerr << "You must give bioiso.3d.smi" << std::endl; exit(-1); } // let's go through all stream : while( !in_stream.eof() ) { // get the juice : nail_juice<> j_from_smi; get_juice_from_stream(in_stream, j_from_smi, 0, smiles_formatT()); // the molecule : molecule<> mol_from_smi(j_from_smi, false, true); // Now streamed to the standard output as sdf file : std::cout << write_nail( mol_from_smi , sdf_formatT() ) << std::endl; } // while() return 0; }You have to take care of implicit hydrogens. The constructor of the molecule takes two flags: if to add implicit hydrogens, and if to initialize the properties. Both flags are defaulted: add implicit hydrogens if defaulted to false, and initialize is defaulted to true. If you add implicit hydrogens, then you have to remove them before canonizing a smile. You can remove the hydrogens calling the free function remove_cdl_implicit_hydrogens(molecule). With the other formats, handling hydrogens is totally left to the user. All hydrogens that come from any format are considered explicit!.
You can find the following example in examples/accessing_props.cpp
// Suppose that we are streaming in a smiles: // i.e. from the command line: "accessing_props < smile.smi" // --- std inclusions : #include <iostream> // --- morpho inclusions : #include <morpho/cdl/molecule/molecule.hpp> #include <morpho/cdl/parsers/parsers.cpp> using namespace morpho::cdl; using namespace std; int main() { typedef molecule<> molecule_t; // let go through all stream : while( !std::cin.eof() ) { // get the molecule : nail_juice<> juice; get_juice_from_stream(std::cin, juice, 0, smiles_formatT()); // Now we construct the molecule: molecule_t mol(juice, false, true); // First access the vertex Set : // declare the iterators : molecule_t::vertex_iterator vi, vi_end; // get the iteretors of the molecule : tie(vi,vi_end)=mol.vertices(); while(vi,vi_end) { // print the vertex descriptor : cout << "==========================\n"; cout << " Vertex : " << *vi << endl; cout << "==========================\n"; // access the adjacent vertices : molecule_t::adjacency_iterator adjacent_i, adjacent_i_end; tie(adjacent_i,adjacent_i_end)=mol.adjacent_vertices(*vi); cout << "-- adjacent vertices --\n"; while(adjacent_i!=adjacent_i_end) { cout << " " << *adjacent_i << " "; ++adjacent_i; } cout << endl; // access the incident edges : molecule_t::out_edge_iterator inc_e_i, inc_e_i_end; tie(inc_e_i,inc_e_i_end)= mol.incident_edges(*vi); cout << "-- incident edges --\n"; while(inc_e_i!=inc_e_i_end) { cout << " source : " << mol.source(*inc_e_i) << ", target: " << mol.target(*inc_e_i) << endl; ++inc_e_i; } cout << endl; // Now get the properties that this vertex maps to : molecule_t::atom_type& atom = mol.get_atom(*vi); // Now print one of the members : cout << " Atomic property: atomic number : " << AtomProperty<atomic_numberS,molecule_t::atom_type>::get(atom) << endl; cout << endl; ++vi; } // Now let's access the edges : molecule_t::edge_iterator ei, ei_end; tie(ei,ei_end)=mol.edges(); while(ei,ei_end) { cout << "==========================\n"; cout << " Edge : (" << mol.source(*ei) << "," << mol.target(*ei) << ")" << endl; cout << "==========================\n"; // Get the bond that maps to this edge : molecule_t::bond_type& bond = mol.get_bond(*ei); // Print a property of the bond : cout << " is bond type : " << BondProperty<type_of_bondS,molecule_t::bond_type>::get(bond) << endl; ++ei; } cout << endl; // Now let's access the atom set : molecule_t::atom_iterator ai, ai_end; tie(ai,ai_end)=mol.atoms(); while(ai!=ai_end) { cout << "==========================\n"; cout << " Atom of symbol : " << AtomProperty<symbolS,molecule_t::atom_type>::get(*ai) << endl; cout << "==========================\n"; // get the incident bonds : molecule_t::incident_bond_iterator ibi, ibi_end; tie(ibi,ibi_end)=mol.incident_bonds(ai); while(ibi!=ibi_end) { cout << " has incident bond of type : " << BondProperty<type_of_bondS,molecule_t::bond_type>::get(*ibi) << endl; ++ibi; } ++ai; } } // while() return 0; }
//----- cdl #include <morpho/cdl/molecule/molecule.hpp> #include <morpho/cdl/parsers/parsers.cpp> //----- std #include <iostream> #include <functional> // for std::mem_fun() using namespace morpho::cdl; int main(int argc, char* argv[]) { // first get the juice nail_juice<> j; get_juice_from_stream(std::cin, j, 0, sdf_formatT()); // now construct the molecule typedef molecule<> M; M m(j, false, true); // now call the previously written algorithm // first let's use atoms_begin() and atoms_end() functions std::cout << " The molecular weight is : "; std::cout << mol_weight(m.atoms_begin(), m.atoms_end()) << std::endl; // now lets use tie() and atoms() M::atom_iterator ai, ai_end; tie(ai, ai_end)= m.atoms(); std::cout << " The molecular weight using other iterators : "; std::cout << mol_weight(ai, ai_end) << std::endl; // Now lets use the more general algorithm we wrote: std::cout << " The molecular weight using other algorithm : "; std::cout << accumulate_prop(ai,ai_end, std::mem_fun(&AtomProperty<atomic_massS,M::atom_type>::get)) return 0; }
See that to get atom iterators we use the function atoms_begin(), atoms_end() and atoms(). This last one, returns a std::pair<> containing the begin and end of the atom iterators respectively.
// --- std inclusions : #include <ostream> #include <string> // --- morpho inclusions : #include <morpho/cdl/molecule/molecule.hpp> #include <morpho/cdl/parsers/parsers.cpp> #include <morpho/fingerprints/fingerprints.hpp> // --- boost inclusions : #include <boost/dynamic_bitset.hpp> int main() { using namespace std; using namespace morpho::cdl; while( !cin.eof() ) { nail_juice<> j; get_juice_from_stream(cin, j, 0, smiles_formatT()); // check if the mol in non empty : if (!j.good()) continue; // get the molecule : molecule<> m(j, false, true); // parameter to the fingerprint calculation : unsigned int max_path_lenght = 7; unsigned int fp_size = 1024; // calculate the fingerprint : boost::dynamic_bitset<> fingerprint = generate_fingerprint(m,max_path_lenght,fp_size); // now stream the FP out : std::string str_bitset; boost::to_string(fingernail,str_bitset); cout << str_bitset << endl; } return 0; }
template <class InputIterator> void reposition_indices(InputIterator first, InputIterator last, typename InputIterator::value_type pos) { while(first!=last) { if(*first>pos) *first -= 1; ++first; } }To substract 1 to every index that is greater than the index of the vertex that you're removing. Then:
template <class Molecule, class ContOfVertices> void remove_vertices(Molecule& m, ContOfVertices& vert2remove) { typename ContOfVertices::iterator vi = vert2remove.begin(), vi_end = vert2remove.end(); while(vi!=vi_end) { m.remove_atom(*vi); detail::reposition_indices(vert2remove.begin(),vert2remove.end(),*vi); ++vi; } }To get rid of the atoms that form part of the fragment.