2007/12/12

CDK, Jython and chemical structure format conversion

These days I'm learning how to work with some new (to me) Java cheminformatics toolkits. Google, Jython 2.2.1, and VoodooPad are helping me get up to speed:


  • Google to find the documentation for the Java APIs

  • Jython to learn API nuances without an edit/compile/test cycle

  • VoodooPad as a scratchpad for sample scripts -- it's easy to save them in pages that sit alongside my worklog



During initial exploration I just bang up the script in VoodooPad, then repeatedly select all (Cmd-A), copy (Cmd-C), switch to a Terminal session running Jython (Cmd-Tab), and paste (Cmd-V).

The "learnings" are going into proper Jython scripts maintained with TextMate.

It's nice to be able to deploy using Jython instead of pure Java. Since almost all of the heavy lifting is being done inside the cheminformatics jars, I don't need to worry about the overhead of running a Python interpreter inside a JVM...

Here's a simple example using CDK, to transform an SD string into a SMILES string.
#! /usr/bin/env jython
# You must have set your CLASSPATH to pick up the CDK jar.
# encoding: utf-8

import java.io
from org.openscience import cdk

def getCDKMol(sdf):
reader = cdk.io.MDLReader(java.io.StringReader(sdf))
result = cdk.Molecule()
result = reader.read(result)
return result

def getSmiles(cdkMol):
result = java.io.StringWriter()
writer = cdk.io.SMILESWriter(result)
writer.write(cdkMol)
return str(result)

def sdfToSmiles(sdf):
return getSmiles(getCDKMol(sdf))

# Example:
print sdfToSmiles("""1-7


42 44 0 0 0 0 999 V2000
-0.8349 0.9908 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-0.0502 1.2457 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0502 2.0707 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.8349 2.3257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3198 1.6582 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.6172 0.7608 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0898 0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5310 -0.0597 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.1984 -0.5446 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9521 -0.2090 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0383 0.6114 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.3709 1.0964 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.8968 0.0346 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.1517 -0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.5997 -1.3631 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7927 -1.1915 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5378 -0.4069 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.1448 1.6582 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.6195 -0.6940 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
3.1044 -0.0265 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.1346 -1.3614 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.2869 -1.1789 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.8546 -2.1477 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.6172 2.5557 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.0898 3.1103 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.2227 -0.3952 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.1122 -1.3651 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.7920 0.9470 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4571 1.9168 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.4488 0.6477 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.9587 -0.9215 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.2407 -1.8046 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.1632 -1.1420 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.1448 0.8332 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.1448 2.4832 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.9698 1.6582 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
3.7719 -0.5114 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.8020 -1.8463 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
3.9544 -1.6638 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.0700 -2.4026 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.6392 -1.8928 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.1096 -2.9323 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 5 1 0 0 0 0
1 7 1 0 0 0 0
2 3 2 0 0 0 0
2 6 1 0 0 0 0
3 4 1 0 0 0 0
3 24 1 0 0 0 0
4 5 2 0 0 0 0
4 25 1 0 0 0 0
5 18 1 0 0 0 0
6 8 2 0 0 0 0
6 12 1 0 0 0 0
7 13 2 0 0 0 0
7 17 1 0 0 0 0
8 9 1 0 0 0 0
8 26 1 0 0 0 0
9 10 2 0 0 0 0
9 27 1 0 0 0 0
10 11 1 0 0 0 0
10 19 1 0 0 0 0
11 12 2 0 0 0 0
11 28 1 0 0 0 0
12 29 1 0 0 0 0
13 14 1 0 0 0 0
13 30 1 0 0 0 0
14 15 2 0 0 0 0
14 31 1 0 0 0 0
15 16 1 0 0 0 0
15 23 1 0 0 0 0
16 17 2 0 0 0 0
16 32 1 0 0 0 0
17 33 1 0 0 0 0
18 34 1 0 0 0 0
18 35 1 0 0 0 0
18 36 1 0 0 0 0
19 20 2 0 0 0 0
19 21 2 0 0 0 0
19 22 1 0 0 0 0
22 37 1 0 0 0 0
22 38 1 0 0 0 0
22 39 1 0 0 0 0
23 40 1 0 0 0 0
23 41 1 0 0 0 0
23 42 1 0 0 0 0
M END
$$$$
"""
)

No comments: