Open Content Platform (OCP) has stack dependencies on Python 3.x, Python libraries, Kafka, and Postgres.
So the first step is to install and configure those dependencies:
Kafka: increase the max message size, enable automatic creation of topics, and increase the partition count. Sample numbers follow for ./kafka/config/server.properties:
And for ./kafka/config/consumer.properties:
Note: To simplify the setup for a Proof of Concept (POC), you can put everything in the same runtime (single server/VM/container/etc), using localhost, without redundancy or TLS certs. We may create a container like this in the future, to reduce the effort of newcomers taking OCP for a test drive.
Examples below are illustrated via PowerShell commands on a Windows server:
- Download OCP from GitHub
- Drop in a local directory (e.g. D:\Software\openContentPlatform)
- Upgrade the python pip installer:
> python -m pip install ‑‑upgrade pip
- Install dependent Python modules, using the packaged requirements file:
> pip install -r D:\Software\openContentPlatform\conf\requirements.txt ‑‑user
Note: You need to resolve any errors here. For example, if wheel files aren’t available for all required dependencies, the installer will attempt to build them locally (meaning you need developer tools installed). If there are conflicts, you can either jump into the weeds and fix them, or take a Python step back.
As an example, as of June 2020, you couldn’t use Python 3.8 on Windows since confluent-kafka wheels were unavailable (https://github.com/confluentinc/confluent-kafka-python/issues/753). But it works if you drop back and install the latest 3.7 version (3.7.7).
- Configure this to run in your stack:
> python D:\Software\openContentPlatform\framework\lib\platformConfig.py
Note: Your base API user and key should be given all access levels (write/delete/admin). Make note of the user/key to use later.
Clone the directory for a few clients:
> md “D:\Software\ocpClient1”
> md “D:\Software\ocpClient2”
> md “D:\Software\ocpClient3”
> Copy-Item -Path “D:\Software\openContentPlatform*” -Destination “D:\Software\ocpClient1” -recurse
> Copy-Item -Path “D:\Software\openContentPlatform*” -Destination “D:\Software\ocpClient2” -recurse
> Copy-Item -Path “D:\Software\openContentPlatform*” -Destination “D:\Software\ocpClient3” -recurse
Start the OCP server from the command line (you can also use a service/daemon):
> python D:\Software\openContentPlatform\framework\openContentPlatform.py
Start some clients:
> python D:\Software\ocpClient1\framework\openContentClient.py resultProcessing
> python D:\Software\ocpClient2\framework\openContentClient.py contentGathering
> python D:\Software\ocpClient3\framework\openContentClient.py universalJob
To confirm it’s up and running, you can use the REST API or walk through the runtime\logs for the server and clients.
To test via the API, take your REST utility of choice (e.g. Insomnia) and add the apiUser and apiKey into the header section (generated previously by running platformConfig). Then issue a GET on one of the API resources. If you configured the platform to use http, localhost, and haven’t changed the port or context root… to see connected clients, you can issue a GET on http://localhost:52707/ocp/config/client. If it shows the clients running on ‘MyDesktop’, then you can request runtime details from the active clients, one level deeper with a GET on http://localhost:52707/ocp/config/client/MyDesktop.
Some points of consideration for the infrastructure footprint follow. Since this is intended as an enterprise product, production deployments would obviously have a larger footprint as you plan for resilience.
For the stack dependencies, you would want multiple Kafka instances. The same goes for ZooKeeper, used by Kafka. For the database, you can accomplish this multiple ways (3rd-party or database-specific clustering, active syncs, virtualization-level mechanisms, storage-level mechanisms).
For OCP, you would have a single server running at any time that you can monitor for problems (API interaction, log monitors, process monitors), and have it fail-over if the active service encounters a problem. This is a classic DR scenario. Regarding the clients, any number can run, of any type… spun up/down at any point in time. They are built to horizontally scale on demand. If you need more performance/capacity/throughput on a certain client type, spin another one up. It will subscribe to the appropriate service and start the work.
The clients would normally run in separate runtime environments than the server. You can have tooling automate the spin up and down of new clients. To enable that, copy/clone the directory structure for an “image” to deploy/execute on demand. Multiple clients (of the same type or different), may be spun up inside the same execution environment – as long as they are deployed in their own sub-directory. This avoids file conflicts, mainly from logging.