The CurlySMILES language features predefined and customer-specific
keys in annotation dictionaries.
The latter start with a $ sign, followed
by a customer-chosen name. The predefined keys are listed and explained
in the following. Notice that the predefined (expected) values also can
be overwritten by starting the value string with a
$ character, whenever one prefers to assign
a different value. Customized keys and values are accepted by the
Python CurlySMILES parser and included into the data objects for a notation,
but are not further interpreted.
Key (ki) |
Meaning and description of associated values
(vi)
|
a |
atomic symbol |
b |
bond specification such as da
and dd for a dative bond at an
accepting and donating atom, respectively.
|
c |
CurlySMILES notation |
e |
formal electric charge: ...,-3,
-2,
-,
+,
+2,
+3,...
(Notice compatibility with SMILES format, but difference with IUPAC
standard, according to which the charge sign follows the number)
|
f |
fraction: two integers separated by forward slash |
i |
pointer to atomic nodes (see for
details)
|
j |
pointer to atomic nodes at next higher
level (see for example notation of bidentate
μ-H ligand in
heterocyclic complex
|
n |
number or number range to specify size or length of a structural unit
such as the number of C atoms in an alkyl group or
the number or SRUs in an oligomer |
p |
pointer to atom positions as substitution sites within component
(format as for i, but excluding
# appendix) |
r |
ring index in an {!r} annotation
anchored at an ANC with more than one ring digits;
the value of r is the digit
of that ring to which the annotation applies |
aa |
atomic symbols as comma-separated list |
cc |
CurlySMILES notations as comma-separated list |
id |
identifier (may contain letters, digits and round braces)
associated with the annotated subject |
all |
name of an allotropic modification, can occur in annotations
of chemical element encodings, alternately or in addition key
psy can be used |
axc |
stereodescriptor of axial chirality:
R and S
(for Ra and Sa)
or
P and M
(see
http://goldbook.iupac.org/A00547.html )
|
box |
boxdyl,
representing a structural part collapsed into a metaterm (see
ARX201/growth-hormone example)
|
bra |
0: linear (unbranched) groups,
1: both branched and linear groups
(default setting),
2: branched-only groups |
cha |
chemical abbreviation (or acronym) for a
chemical name associated with a structure, species or compound
(see DMSO example)
|
chc |
chemical code for a chemical (see
SCYX-7158 example) |
chn |
chemical name associated with a structure,
species or compound (see
tiglic_acid example)
|
cos |
cosolvent(s) encoded in
ConjCN format
(within aq , dp and
ds annotations)
|
cot |
cosolute(s) encoded in
ConjCN format
(within aq , dp and
ds annotations)
|
cpq |
copolymer qualifier:
a for alternating, b for block, c for co (generic), g for graft,
p for periodic, r for random and s for statistical |
cps |
coordination geometry polyhedral symbol: for example
TBPY-5 for trigonal bipyramid of a
mononuclear complex with coordination number 5
(see Table IR-9-2 on page 176 in Nomenclature of Inorganic Chemistry, RSC Publishing, IUPAC Recommendation 2005) |
csy |
crystal system descriptor:
a (triclinic),
c (cubic),
h (hexagonal),
m (monoclinic),
o (orthorhombic),
r (rhombohedral, trigonal)
t (tetragonal)
|
ctr |
stereochemical description based on the extended
cis/trans formalism:
c, r and
t for cis-, reference-, and
trans-substituent, respectively
|
dpr |
degree-of-polymerization range:
an integer number range or an integer following “gt”
(for example, gt250 when the degree of polymerization is greater than 250) |
emr |
end member and range values
to specify compositional series |
enz |
short name for an enzyme |
esa |
stereochemical description of cyclic systems with stereogenic
centers using the endo/exo,syn/anti formalism:
a for anti,
n for endo,
s for syn, and
x for exo
|
exc |
position integers for atomic nodes that are excluded, for example,
from a structural repeat unit (see
homopolymers)
|
ful |
fullerene notation: C followed by a
stoichiometric integer, optionally followed by an hyphen and
point group symbol |
hel |
chiral helicity: P for plus and
M for minus (see
http://goldbook.iupac.org/H02763.html ) |
ila |
isotopical label notations: specific
isotopes, sets and isotope-based compositions
|
ilu |
index range(s) to specifiy node set of
ladder unit in polymer notation
|
inc |
position integers for atomic nodes that are included, for example,
into a structural repeat unit (see
homopolymers)
|
isp |
isomer due to spin of proton(s): ortho
or para |
mac |
material class name |
man |
material name |
min |
mineral name |
mps |
multiphase system, represented as
a sequence of
slash-separated ConjCN notations |
nuc |
nuclide specification of virtual atoms for which
a two-letter atomic symbol does not (yet) exist: value is in
nuclide encoding format based on either
the atomic number or the temporary
three-letter atomic symbol.
|
par |
partitioning system: comma-separated list of
CurlySMILES notations of the solvent phases between
which an annotated species is distributed |
pdi |
polydispersity index |
pep |
peptide notation based on three- or one-letter codes for amino acids |
pha |
phase specification, in a liquid crystal (lc) annotation:
nem, dis,
smA, smB,
and smC for nematic, discotic,
smectic A, smectic B, and smectic C |
phn |
phase name specifying polymorphs, used, for example, in
{*TiO2}{crphn=rutile},
{*TiO2}{crphn=anatase}, and
{*TiO2}{crphn=brookite} to
encode the titanium dioxide polymorphs rutile, anatase and brookite;
alternately or in addition key
psi can be used
|
plm |
short name for a polymer product |
pro |
short name for a protein |
psy |
Pearson symbol for phase specification, a three-character
notation: first, a lower-case letter(
a, c,
h, m,
o or t)
designating the crystal system; second, a capital letter
F, I,
P, R
or S) designating the lattice
setting; third, a number designating the number of atoms
or ions in the unit cell
|
pMm |
mass-average molar mass of polymer |
pMn |
number-average molar mass of polymer |
pMp |
peak molar mass of polymer |
pMv |
viscosity-average molar mass of polymer |
pMz |
z-average molar mass of polymer |
rcg |
relative coordination geometry:
cis, trans,
mer, and fac
(applies to square planar and octahedral mononuclear complexes
with only two kinds of ligands, i.e. donor atoms)
|
sfd |
surface description: Miller indices or other surface notation |
sfn |
stoichiometric formula notation |
slt |
solute(s) encoded in
ConjCN format
(for example, within lq , sd
and IM annotations)
|
slv |
solvent(s) encoded in
ConjCN format
(within dp and ds annotations)
|
spg |
space group symbol (Hermann-Mauguin notation):
230 space group notations |
srf |
surface notation: stoichiometric formula notation followed by
a crystallographic plane specification (Miller indices) enclosed
in parentheses |
tmp |
template specification; example:
[Au]{nltmp=ss-DNA}
for gold nanocluster (nl ) templated by
single-stranded DNA (ss-DNA ) |
trd |
trade name |