Skip to main content
RDKit-rs
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Cheminee

Crates.io License

Cheminee is a chemical structure search engine. Index chemical structures with arbitrary metadata, then search by substructure, superstructure, exact match, similarity, or descriptor queries. Built on Tantivy and rdkit-rs.

Your callers don’t need RDKit — just talk to the REST API.

Key Features

  • Structure search — Substructure, superstructure, identity (exact match), and Tanimoto similarity search
  • Descriptor search — Query by any RDKit descriptor (exactmw, NumAtoms, etc.) or custom metadata
  • SMILES standardization — Fragment parent, uncharger, and canonicalization in bulk
  • Format conversion — SMILES to molblock and molblock to SMILES
  • Neural similarity search — Similarity queries use a neural network encoder (cheminee-similarity-model) to embed Morgan fingerprints into a latent space, then search ranked clusters instead of brute-forcing every compound
  • REST API — OpenAPI-documented endpoints with built-in Swagger UI
  • CLI — Batch index SDF files, run queries from the terminal
  • Docker imageghcr.io/rdkit-rs/cheminee
  • Ruby clientcheminee-ruby gem for programmatic access

API Endpoints

MethodPathDescription
POST/v1/standardizeStandardize a list of SMILES
POST/v1/convert/mol_block_to_smilesConvert molblocks to SMILES
POST/v1/convert/smiles_to_mol_blockConvert SMILES to molblocks
GET/v1/schemasList available index schemas
GET/v1/indexesList indexes
GET/v1/indexes/{index}Get index details
POST/v1/indexes/{index}Create an index
DELETE/v1/indexes/{index}Delete an index
POST/v1/indexes/{index}/mergeMerge index segments
POST/v1/indexes/{index}/bulk_indexIndex SMILES with metadata
DELETE/v1/indexes/{index}/bulk_deleteDelete compounds by SMILES
GET/v1/indexes/{index}/search/basicBasic descriptor/metadata search
GET/v1/indexes/{index}/search/substructureSubstructure search
GET/v1/indexes/{index}/search/superstructureSuperstructure search
GET/v1/indexes/{index}/search/identityExact structure match

Quick Start with Docker

Run Cheminee:

docker run --rm -p 4001:4001 ghcr.io/rdkit-rs/cheminee:latest

Visit localhost:4001 for the Swagger UI.

Index Some Data

Fetch PubChem SDF files and index them:

docker exec -it cheminee bash

mkdir -p tmp/sdfs
cheminee fetch-pubchem -d tmp/sdfs
cheminee create-index -i tmp/cheminee/index0 -n descriptor_v1 -s exactmw
cheminee index-sdf -s tmp/sdfs/Compound_000000001_000500000.sdf.gz -i tmp/cheminee/index0

CLI Examples

Basic search — query by descriptor ranges:

cheminee basic-search -i /tmp/cheminee/index0 \
  -q "exactmw: [10 TO 10000] AND NumAtoms: [8 TO 100]" -l 10

Substructure search:

cheminee substructure-search -i /tmp/cheminee/index0 \
  -s CCC -r 10 -t 10 -u true -e "exactmw: [20 TO 200]"

Similarity search:

cheminee similarity-search -i /tmp/cheminee/index0 \
  -s c1ccccc1CC -r 10 -t 10 -p 0.1 -m 0.4

Building from Source

cargo run --release --package cheminee --bin cheminee -- rest-api-server