OrChem: information for developers

Introduction

This page contains (some detailed) pointers for people doing development on OrChem.

Broadly speaking, the main development for OrChem is done in Java. The CDK is used as a core library around which OrChem Java classes are wrapped to be used as Java stored procedures, having a public static entry point. The functions can be invoked through a required PL/SQL wrapper around them.

To see how that works, take a look at the query
  select orchem_calculate.charge('[O-]C([N+])C([N+])C','SMILES') C from dual

This is a SQL statement that calls a Pl/SQL function called 'charge' in package 'orchem_calculate'. When you look inside that package, you see this:
  FUNCTION charge(molecule clob,input_type varchar2)
  RETURN number
  IS LANGUAGE JAVA NAME
  'uk.ac.ebi.orchem.calc.ChemicalProperties.calculateCharge(oracle.sql.CLOB , java.lang.String) return oracle.sql.NUMBER';

So, the PL/SQL wraps a function in a Java class called ChemicalProperties. That class has a public static method 'calculateCharge', which in the end makes a call to the CDK, in this case 'AtomContainerManipulator.getTotalFormalCharge(mol)'.

This is generally the way things are (and should stay) structured in OrChem. The Pl/SQL tends to be a thin layer, the OrChem Java classes do database specific things and any chemical functionality should be coming from the CDK, not OrChem directly.

Requirements and software

Developing OrChem requires you have knowlegde of Java, Pl/SQL, SQL and JDBC. Oracle 11G runs Java version 5, so any Java development (and compilation) needs to be Java 5 compliant.

Requirements for development environment

CVS client software, to access the Sourceforge repository
Java (version 5 (!) because Oracle 11G internal JVM is on release 5)
An Oracle client installation (for binaries like loadjava, dropjava, sqlplus). Oracle client software can be downloaded from OTN.
Perl, for unit testing
Ant, for building
You should have a Oracle 11G schema at your disposal for which you have developer privileges (like Oracle's RESOURCE role), so you can wipe clean and rebuild it for testing.

Whats where?

The Orchem source folders contain a mix of Java, PL/SQL and SQL. There is also a Web folder containing a web application which allows you to enter a search (drawn or by text) and admire the search results. The web application is not meant as a core deliverable, it's more for use in development.

The Orchem folder structure is shown below:


build.xml                                 : ant build file                                 
doc                                       : contains the documentation you are reading now
orchem.jdev11.jpr/orchem.jdev11.jws       : JDeveloper project definition files
properties                                : properties you have to override for running the test or web application
public_html                               : Demo web application to visualize searching
 ├- WEB-INF                               : 
      ├- lib                              : any jar file needed by OrChem (to do: put non-web jar files in separate lib folder)

sh                                        : Unix shell script(s)
sql                                       : DDL for building OrChem objects 
 ├- calculate.plsql                       :   Pl/Sql for chemical calculation packages
 ├- convert.plsql                         :   Pl/Sql for file format conversion
 ├- createfingerprints.plsql              :   Pl/Sql for fingerprint creation
 ├- ....                                  :   etc ... 
 ├- tables.sql                            :   Tables
 ├- types.sql                             :   Types 
 ├- ....                                  :   etc ...

src                                       : Java source folder
  uk/ac/ebi/orchem                        :
              ├- bean                     :  Beans
              ├- calc                     :  Calculation of chemical properties
              ├- codegen                  :  Generator for QSAR descriptor classes
              ├- convert                  :  Conversion between chemical file formats
              ├- fingerprint              :  Fingerprinting of compounds
              ├- isomorphism              :  Isomorphism/Vf2 used for substructure searching
              ├- load                     :  Persisting fingerprints
              ├- qsar                     :  Qsar calculations
              ├- search                   :  Search algorithms
              ├- shared                   :  Utility classes
              ├- tautomers                :  Tautomer generation 
              ├- test                     :  JUnit test classes
              ├- web                      :  Struts action classes for demo web app

test                                      : Unit testing
 ├- main.pl                               : Main test API, little interactive Perl program 
 ├- step1_compoundload.bash               : Linux bash script to load a test set of compounds into the test schema  
 ├- step1_compoundload.bat                : Windows equivalent
 ├- ...                                   : etc ..

Building

The build file has the following options/targets:

clean : delete all build directories, as well as /build files
compile : compile core classes
war : clean, compile, and create WAR file (tested Tomcat6)
dist : make a tar.gz for OrChem releases

The build process is mostly useful to make a release. Only the core DDL, jar files and classes that are necessary to install and run OrChem are assembled into an archive and zipped up. Make sure you use Java 5.

Unit testing

OrChem consists of Oracle and Java components; both need to be part of the unit test process.

A test suite has been set up to test everything from setting up a new OrChem schema to similarity and substructure searching. You can run the test suite from the command line. Both Linux and Windows are supported by the framework.
To use the framework, you must have Perl and Ant installed (download and install if necessary before continuing).

In cvs directory "OrChem/test" you can find the test framework. The idea is that after making changes to OrChem, you execute the complete set of test steps. This way, you'll be testing not just Java but also Oracle components and the dependencies between Java and the database.

Setting up the unit test

Create a blank Oracle schema in a test database. The schema is to be for testing only, we'll populate it later on with test data.
Navigate to directory OrChem/properties. Copy file "unittest.properties" into subdirectory "my_local_copies". Navigate to "my_local_copies". The unit test framework will use this particular file for its database connections. Do not add this copy to CVS. Just modify the file by providing the correct database details.
Navigate to directory "OrChem/test". Type perl main.pl
You'll see the screen below.
```
________________________________________________________________

 OrChem unit testing
________________________________________________________________

Reading file unittest.properties

   username : testuser
   password : testpw
   url      : jdbc:oracle:thin:@172.22.69.139:1521:marx
   tns name : marx

BEFORE YOU PROCEED: realise that step 1 of the test suite DROPS all OrChem objects in this schema!
Do you want to test with these settings ?

Enter y or n:
```
Check that the database connection details are correct. If so, type y. You will then get to the main menu below.
The menu options are listed in logical order. If you run the test for the first time, start with Option 1, then Option 2 etc.
```
________________________________________________________________

 OrChem unit testing
________________________________________________________________


   Oper.syst: linux
   username : TESTUSER
   password : testpw
   url      : jdbc:oracle:thin:@127.0.0.1:1521:orcl
   tns name : orcl
   work dir : /tmp/

 - Option 1:  - drop ALL(!!) user database objects
               create fresh schema
               load (query) compounds
 - Option 2:  - loadjava
 - Option 3:  - fingerprinting
 - Option 4:  - JUnit test similarity search
 - Option 5:  - JUnit test substructure search
 - Option 6:  - JUnit test convert
 - Option 7:  - JUnit test QSAR descriptors
 - Option 8:  - JUnit test SMARTS search
 - Option 9:  - JUnit test RGroup Query substructure search
 - Option 10: - JUnit test property calculation


 - Option 0: - EXIT

________________________________________________________________

Please make a choice from the menu above.
Enter menu option :
```
- Option 1 drops OrChem objects in the test schema and creates a schema from scratch using the scripts in ./OrChem/sql. After that some sample data is loaded into the empty test schema, only a few hundred compounds in total.
- Option 2 compiles the OrChem classes and selects a number of these to get bundled into an OrChem jar file. This OrChem jar together with more static other jars (cdk jar, log4j etc) are then loaded into the test schema.
- Option 3 runs the OrChem fingerprinter against the sample data. Tables ORCHEM_FINGPRINT_SIMSEARCH and ORCHEM_FINGPRINT_SUBSEARCH get populated in this process.
- Option 4 is a JUnit test, testing expected similarity search results against the sample data. So whenever you change the fingerprinter, or add more compounds to the sample, you are likely to have to change the JUnit expected outcome.
- Option 5, same as Option 4 but now testing substructure searching.
- Option 6 offers JUnit tests for various file conversions
- Option 7 lets you test a few CDK QSAR calculations
- Option 8 tests a few SMARTS searches (using ambit)
- Option 9 runs a few R-group queries. These are scaffold type queries with a skeleton and variable attachment groups.
- Option 10 is to test calculation of properties by the CDK, such as mass and formula.

Creating a CDK jar file that loads into Oracle

The CDK jar file loaded into Oracle is built with one or two hacks, steps of which are described below. The steps are necessary because Aurora doesn't really get the various "@Test.." annotations used throughout the CDK, so these source code lines must be stripped out. Any Java 6 or 7 feature must be stripped out, really. Below is a way to do this on Linux (using a shell script).

Locate the script file sh/noAnn.sh in the OrChem source directory, open it and make sure step1 is uncommented and step2 is commented.
Git clone git://cdk.git.sourceforge.net/gitroot/cdk/cdk
Navigate to the CDK's src/main folder
Call noAnn.sh from within src/main
Let the script run.
Once finished, you have stripped out the "@Test" annotations
Set your Java environment to Java 5 (release >5 is not supported in Oracle 11G's JVM)
Navigate to the CDK top folder where file "build.xml" can be found
Modify the CDK build file so that javac 1.5 is OK: relevant lines to change "1.6" -> "1.5" are in the build file for targets "runDoclet" and "compile-module".
Run 'ant clean dist-all'. If there are minor compiler errors (bad stripping), locate the error and fix the problem (should be easy), then run 'ant dist-all' again (and again..) until compilation OK.
When compiled, move into dist/jar directory
Open file OrChem's noAnn.sh, comment step1 and uncomment step2, save file.
Call noAnn.sh from within the dist/jar folder
Let the script run..
You now have a cdk.jar file you can load into Oracle
Side note: the JChempaint ('JCP') jar file loaded into the database (jcp.jar) is even more tricky to make (d.d. July 2011). JCP is used to make pictures in the convert package.
The difficulty is that the CDK in its current releases does not contain the JChempaint classes. However, as soon as JCP is distributed as part of the CDK again (packages controller and renderer), the integration should become easy enough again.

Creating the InChi jar file

Background: the CDK uses JNI-InChI to generate InChI descriptors, but this approach does not work for OrChem. The problem is in the dynamic loading of libraries, something that is not allowed with the Oracle JVM.

Instead, a direct conversion from the InChI C++ source files to a Java class is used. This conversion is done by us with NestedVM, a clever tool capable of compiling C++ into a Java class.
Below follow the steps necessary to create the jar file from the C++ source code.

Download NestedVm by making a Git clone as decribed on the NestedVM website, or take the snapshot provided (nestedvm-2009-08-09.tgz is used in this example) For more info on NestedVM, consult this Wiki page
Install NestedVM on your computer. We installed it on Windows7/Cygwin after the conversion failed with CentOS Linux. Before installing NestedVM you'll need darcs set up. Darcs can be built or downloaded as a binary, we did the latter.
So the conversion was done with Windows7+Cywgin, nestedvm-2009-08-09.tgz, darcs 2.5, gcc 4.3.4.
To install NestedVM, run the make command. Then run make upstream/tasks/build_gcc_step2 (I had to correct a header file in the source during that make)
The NestedVM installation should give you binaries in .\upstream\install\bin\, like "mips-unknown-elf-gcc" and "mips-unknown-elf-g++". Make sure that your O/S path contains the location of the NestedVM binaries.
With NestedVM installed, download the zipped InChi source code (INCHI-1-API.zip) from the IUPAC download site. This example uses InChI software 1.03
Unzip the InChI download and locate the makefile in folder INCHI-1-API\INCHI\gcc\inchi-1

In this makefile, change the references to gcc and g++ as below, and change the target executable name :

...
ifndef C_COMPILER
  C_COMPILER   = mips-unknown-elf-gcc
endif

ifndef CPP_COMPILER
  CPP_COMPILER = mips-unknown-elf-g++
endif

ifndef LINKER
  LINKER = mips-unknown-elf-g++
endif
....
  INCHI_EXECUTABLE = inchi-1.mips
...

Run the make command, resulting in a file called inchi-1.mips

Run NestedVM to create a .class file from the .mips file. An example:

java -classpath C:\src\inchi\nestedvm\build\;C:\src\inchi\nestedvm\upstream\build\classgen\build\ org.ibex.nestedvm.Compiler -o 
lessConstants -outfile org/iupac/StdInchi103.class org.iupac.StdInchi103 C:\src\inchi\INCHI-1-API\INCHI\gcc\inchi-1\inchi-1.mips

You can now test the class file in your IDE. You need a classpath reference to the runtime NestedVM classes found in the NestedVM directory build/org/ibex/nestedvm/, such as Runtime.class.
These runtime classes are also part of the jar file found in the OrChem lib directory.

Example in Java using the InChI class:

import org.iupac.StdInchi103;

public class Test {
    public static void main(String[] args) {
        //Prepare an array to shove into the Inchi generator
        args = new String[6];
        args[0] = "";
        args[1] = "/home/markr/molecules/ChEBI_16704.mol";
        args[2] = "/tmp/inchi.out";
        args[3] = "/tmp/inchi.log";
        args[4] = "/tmp/inchi.log2";
        args[5] = "-Key";
        //Call the actual Inchi
        StdInchi103 i = new StdInchi103();
        i.run(args);
    }
}

Demo web app

The OrChem folder structure contains a web application for test purposes, see some screenshots below.
With the web application you can visualize your searches, rather than doing them from the database command line. To make things work, you first of all need to copy file 'properties/webapp.properties' into folder 'properties/my_local_copies' and edit this file with settings that work for your local test database.
We managed to get the demo web application working in JDeveloper, but to set this up was complicated. An easier option is to run the ant task 'war' and deploy the resulting war file to a Tomcat server and run it there.

Back to main page

Development