Instead of having to run within a single runtime environment, the internal services were designed to scale horizontally. The clients (workers) can run wherever desired. Simply run another process for it to subscribe to its service, get dropped in the resource pool, and begin working.
No single product will meet all requirements in a solution space. But if the product is flexible and extensible, users have the option to either build it themselves or hire someone to build it for them. And since this is open-source, you aren’t limited to what was exposed for extensibility on the proprietary side (e.g. a new database class or content gathering job). You can drop in a completely new service. Maybe you want to create an event management subsystem and use this as a monitoring product. Or perhaps you want to create a new GUI service.
Designed for developers
The Core Framework is designed for developers to add new functionality. The language (Python) was selected by analyzing acceptance by system administrators, along with a prior decade of computer science programs at universities. Simply put, most system administrators and developer resources will have experience in Python and will be able to manage this platform, as well as modify or drop in new content. And you can build a support team that only needs to know Python scripting and REST APIs, which are standard entry-level skills for IT-based graduates.
Reduced product complexity
OCP leverages components in the open-source space, to greatly reduce product complexity. On the infrastructure side, it’s built on Python, uses Kafka for the bus, and PostgreSQL for the RDBMS. To name a few Python components, OCP uses Twisted for client/server networking and web server components, SQLAlchemy for the database ORM, and Hug for REST API extensions.
Native support for data lakes
Like standard IT discovery solutions, OCP leverages a backend RDBMS for storage. The database structure is mainly inheritance-based (a.k.a. hierarchical), and the ORM helps simplify complex queries. Where OCP stands apart is that it organically enables data flows into big data solutions. All content gathering jobs send data through a bus (Kafka). You can either send it back to the default topic (channel/queue), or to a custom named topic. If you send to the default topic, it will end up in our RDBMS solution (Postgres). If you send to a different topic, you can stream that data into your data lake.
This methodology allows sending different data sets to multiple tools all from the same job. For example, in a job going after running software on servers, the CMS may only wish to retain a subset of the software, whereas you may want to send all discovered software into your data lake. This is different from an external job pulling CMS data into your data lake thereafter; by that time you only have access to the structured datasets retained by the CMS.
Consider a content gathering job executing certain commands, like the one mentioned above for running software. You may have 3 separate teams discovering the same data for their own purpose, using disparate agent or agent-less mechanisms. Say team 1 is looking at software for license compliance or financial reasons. Team 2 is looking for software modeled by the company’s defined Services. And team 3 is doing it for security reasons. You may be able to eliminate multiple tools issuing the same commands. Since data flows through a distributed bus, it’s trivial to send data into multiple topics. Technically it would take a single new line in a script, to send data (structured or unstructured) to a new destination. And you could spin that around, if another tool is already gathering the data and has the ability to send to a bus or endpoint.
Screen captured walk through