Groups are encoded by using the structural unit annotation format, as shown for terminal groups and multivalent groups. To include hints or details about the group environment, the group environment annotation format is used. The following example demonstrates encoding of a group, the hexanoyloxo group, in which the neighborhood of the O-atom with an open single bond is specified as an alkyl group:
load gif/hexanoyloxo.gif
CCCCCC(=O)O{-R}
Hexanoyloxo group of alkyl n-hexanoates
The neighborhood is not considered to be part of the encoded group. Otherwise the annotation marker {+R} should be used to encode the corresponding set of molecules, alkyl n-hexanoate molecules in this case.
By default, the annotation {-R} refers to both branched and linear alkyl groups. To exclude branched alkyl groups, the entry bra=0 needs to be added:
CCCCCC(=O)O{-Rbra=0}
Hexanoyloxo group of n-alkyl hexanoates
Annotations marked by {-X} encode groups attached to an halogene atom, such as the hexanoyl group in n-hexanoyl halides:
CCCCCC{-X}=O
Hexanoyl group of n-hexanoyl halides
Annotations marked by {-Y} encode groups attached to any other group. In the following example the other group is an oxo group, O{-}{-}:
CCCCCC{-Yc=O{-}{-}}=O
Hexanoyl group attached to an oxo group
Formally, the open bond connects to the first matching open bond of the group notation inside the annotation, reading from left to right.
Certainly, the group focus can be flipped:
O{-}{-Yc=C{-}(=O)CCCCC}}
Oxo group attached to hexanoyl group
As shown, the group environment annotation format permits group encoding that precisely describes what is part of a given group and what belongs to the group environment. Such distinction is critical, for example, in the development and application of group contribition methods (GCMs), in which particular contributions often depend on neighborhood specification to account for intramolecular interactions.
_ __ __ submit to reddit __

__ Share on Tumblr ___ bookmark this page

Reference

A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. J. Cheminf. 2011, 3:1;
doi: 10.1186/1758-2946-3-1.




Custom Search