CurlySMILES is a chemical language
for the formulation of linear notations
that specify materials, chemical compounds, and complex
architectures by compositional
and molecular structure encoding. CurlySMILES is a versatile
language addressing various applications:
- Documentation and
search of chemical information
- Formulation of precise, yet grainable
chemical queries
- Computation of material descriptors
and molecular descriptors
- Composition- and structure-based property
estimation
- Measurement of material similarity
and molecular similarity
- Rational material design and
molecular design
- Supervised generation or virtual combinatorial
libraries
CurlySMILES notation.
A CurlySMILES notation is a string with dot-separated
subnotations. For example, cobalt(II) nitrate hexahydrate,
Co(NO3)2·6H2O,
can be entered as:
[Co+2].[O-]N(=O)=O{2}.O{6}
Here, multipliers 2 and 6, enclosed in curly braces at the end of each species,
are applied to account for the number of nitrate anions and water molecules,
respectively. The corresponding notation based on the original SMILES language
is:
[Co+2].[O-]N(=O)=O.[O-]N(=O)=O.O.O.O.O.O.O
The CurlySMILES language introduces more formats that provide encoding short-cuts;
mainly, aliases for frequently occuring notations of cations, anions, and other
chemical species.
A string with exactly one subnotation is called a unary CurlySMILES
notation. For example, the aromatic tropylium cation,
C7H7+,
is encoded in CurlySMILES as
c1cccccc1{!re=+}
This example demonstrates the key approach of the CurlySMILES grammar:
an annotation enclosed in curly braces. Here, the annotation consists of
the ring marker !r, indicating that the following entry,
e=+, which specifies a formal charge, applies
to the entire ring. The annotation is formally anchored to the last atomic node
in the notation. In general, an annotations can either be anchored at an atomic
node or attached to a subnotation, to encode details of the respective atom,
the structural environment of that atom or the whole molecule.
The CurlySMILES
language includes a rich annotation grammar (see
CurlySMILES: annotated SMILES notations) that
covers a diverse set of structural, substructural and extrastructural aspects
of a molecule. Further, the annotation format is open to incorporate customized
code by still adhering to the basic syntax of CurlySMILES.
|