Overview / architecture

Back to main page

The picture below shows the main components of OrChem in yellow. Shown left in gray is the user's table with chemical compounds. This table (or multiple tables unified in a view) is presumed to have some unique identifier column (character or numeric) and then also a VARCHAR2 or CLOB column with molecular data (MolFiles and SMILES are accepted). The identifier and molecular data column are mandatory for OrChem to work. During OrChem installation the table ORCHEM_PARAMETERS is populated; this one-row table stores the name of user's compound table and its primary key column.

Before OrChem can be used to search the compound table, a Java stored procedure (package "orchem_fingerprinting") needs to run to populate tables that support similarity and substructure searching.
The fingerprinting procedure reads the entire content of user's compound table, and for each compound creates a CDK molecule and fingerprints it. The time it takes for this procedure to complete depends on the size of the compound table and the complexity of the compounds. The procedure can/should be parallelized, for instance by using DBMS_JOB.

The two shown tables ORCHEM_FINGPRINT_SIMSEARCH and ORCHEM_FINGPRINT_SUBSEARCH are populated by the fingerprinting procedure: OrChem substructure searching is generally more expensive and slower than similarity searching because it includes the costly graph matching step. To boost performance, Oracle's parallel piped function feature has been implemented for the substructure search. The user can actually choose between parallelized and non-parallelized substructure searches (see the two 'subsearch' packages in the picture). The parallel search requires an extra step making the query a two-step process. Although this is extra overhead, the benefit may be serious performance gain, depending on the query and the amount of cores/processors on your database server.


You can opt out of OrChem's search implementations; instead, you could only install the Java libraries in a database schema and so have the CDK at your disposal inside Oracle. A number of chemical format conversion functions have been set up for your convenience, and developers can build additional Java stored procedure wrappers around CDK class methods of your particular interest. This is a relatively straightforward process, and once in place the rich functionality of the CDK is available to all SQL and PL/SQL in your database applications.