A set of molecules can be finite or virtually infinite. In the latter case we often speak of a molecule or compound class, which has an uncountable number of members, but within which only a limited number of members are of interest for most practical purposes. A very common class type is a homologous series, which is defined by a root or parent member and in which following class members are formally generated by successively inserting a bivalent group such as a methylen group between an already present methylen group and another group.
Chemists often sketch such sets or classes by simply drawing one member in a generic manner, using the symbols R, X, and Y (and others) in the same way they use element symbols. The following structure contains symbol R, representing an arbitrary alky group, to define the set of alky n-hexanoates. The corresponding CurlySMILES encoding uses the annotation {+R} to formally substitute the H-atom of the carboxylic acid group:
load gif/alkyl_hexanoates.gif
CCCCCC(=O)O{+R}
Alkyl n-hexanoates
The CurlySMILES annotation format provides various methods to encode a set of molecules in a more specific or limiting manner. For example, the annotation entry n=2-10 limits the above set to those molecules that contain alkyl groups with a C-atom count that is greater or equal to two and lower or equal to ten:
CCCCCC(=O)O{+Rn=2-10}
Ethyl-to-decyl n-hexanoates
This set can further be constrained by excluding branched alkyl groups:
CCCCCC(=O)O{+Rbra=0;n=2-10}
Ethyl-to-n-decyl n-hexanoates
_ __ __ submit to reddit __

__ Share on Tumblr ___ bookmark this page

References

[1] A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. J. Cheminf. 2011, 3:1; doi: 10.1186/1758-2946-3-1.


Custom Search