CurlySMILES [1] provides a shortcut formalism to encode
a molecule with multiple occurrences of
structurally equal substituents.
The approach: encoding of the unsubstituted molecule enhanced
by an annotation that encodes the substituent and the
positions in the molecule where substitution occurs.
This approach is illustrated for
sym-pentasubstituted
corannulenes. The page within the frame below presents a
molecular sketch for sym-pentasubstituted
corannulenes and the associated publication [2] provides details
on the synthesis and properties of diverse derivatives.
We start with the comparison of a SMILES and a corresponding
CurlySMILES notation for
sym-pentachlorocorannulene:
|
Clc1cc2c(Cl)cc3c(Cl)cc4c(Cl)cc5c(Cl)cc1c6c2c3c4c56
|
SMILES notation
|
c1cc2ccc3ccc4ccc5c{+Xc=Cl{-};i=1,4,7,10}cc1c6c2c3c4c56
|
CurlySMILES notation
|
|
In the CurlySMILES notation the
OPAM annotation at position 13 encodes
the substituent, Cl{-}, and specifies
position 1, 4, 7 and 10 at which substitution takes place in
addition to position 13.
Since the substituent contains just one atom the CurlySMILES is longer than the SMILES notation. But this situation changes rapidly with increasing size of the substituent, for example with the substituent trimethylsilylethynyl:
|
c1cc2ccc3ccc4ccc5c{+Yc=C[Si](C)(C)C#C{-};i=1,4,7,10}cc1c6c2c3c4c56
|
sym-pentakis(trimethylsilylacetylene)corannulene (compound 11 in [2])
|
|
The annotation-based notation has not only the advantage of
string-length reduction, but also clearly separates
parent and substituent
structure, which significantly simplifies algorithms that
perform queries involving parent/derivative screening and
filtering.
|
_
__
__
__
__
Share on Tumblr
___
|