What is Insight Engine?

I am interested in learning more about VAST InsightEngine and its capabilities.

InsightEngine seems very promising for future AI engineering, particularly in areas like LLMs (Large Language Models) and Generative AI. I would like to start using it as soon as it becomes available.

I have two specific questions about InsightEngine:

  1. System Architecture:
    Could you provide details about the system architecture of the C-Box when used with InsightEngine?
    I understand that InsightEngine utilizes NIM or Nemo. My current understanding is that GPUs are required for running NIM or Nemo. Does the new C-Box include GPUs to support this functionality?

  2. Role in AI Engineering:
    What role does InsightEngine play in AI engineering workflows? Specifically, how does VAST Data, in combination with InsightEngine, compare to traditional vector databases like Milvus or Chroma? Could VAST Data potentially replace them in certain use cases?

Does anyone know anything about this?

2 Likes

Hi kodai, great questions.

1a (CBOX): during our beta phase, there are no HW changes. Existing IceLake & AMD Epyc based (CPU only) systems will be used. For services during beta which require GPUs, partners and customers will allocate/provision one or more of the following:

 a.  GPU systems which can be added to a k8s cluster.  VAST is deploying a series of services on k8s systems which allow bi-directional communication with a VAST Cluster, such that the VAST control plane can monitor and manage certain types of services (this is evolving as we iterate on our codebase).
b.  GPU systems which are separate from k8s and are 100% customer managed.  Interaction with models deployed on those GPU servers will occur via configuration on the pipelines which customers define on their VAST cluster,.  For example, if an NVIDIA NIM/Model is required for inference, and the model is hosted on an existing, non-managed GPU server, a customer could set an ENVIRONMENT_VARIABLE on their VAST managed pipeline to send inference calls to a defined model endpoint (eg: https://mygpu.client.com/v1/...)
  1. AI engineering → It seems like your question is more related to VectorDB’s than the broader scope of ‘ai engineering’. VAST has already implemented a large scale database platform. What’s missing in current GA code is support for the types of data structures and query/search optimizations typically associated with searching for vector embeddings. We are in the process of creating these as extensions to our existing Database, and will be launching initial support for using VAST as a native vector store later this year.

The ‘short’ answer is “yes, VAST could potentially replace Milvus, Chroma, etc”…once we complete our R&D effort :slight_smile:

7 Likes

and @kodai I suppose some follow-ups

  1. what kinds of pipelines are you deploying (test or prod) today?
  2. are you using milvus/chroma/weaviate/etc?
  3. do you use k8s?

we are definitely looking for feedback and use cases so we can align our strategy to what people are actually doing…

Thank you for your prompt reply.
I now have a clear understanding of the matter.

I am a member of the distribution company which handle VAST DATA.
As one of the distributors, we are preparing to utilize InsightEngine for demonstration purposes.
This demonstration has the potential to greatly appeal to our customers who are developing or utilizing AI technologies.

Therefore, we would appreciate receiving detailed information about InsightEngine as soon as possible.

Best regards.

I did not answer your question.

  1. what kinds of pipelines are you deploying (test or prod) today?
    → LLM and RAG using NIM and Nemo
  2. are you using milvus/chroma/weaviate/etc?
    → milvus/chroma/FAISS
  3. do you use k8s?
    →Sorry no

if no k8s, please describe your GPU setup (e.g. system config, system topology including network, and current sw deployment , i.e., GPU aware container instances running on … ?

Im so sorry for the delayed response; I just noticed your message. I will check with the engineers shortly

1 Like

I apologize for replying late.
I asked our application engineering team.

They are using Run:AI to deploy multiple NIM containers and verify the RAG pipeline on GPUs.

The GPUs are located in our company’s data center, and for the RAG pipeline, they are using three H100s and one A40.
The H100s handle NIM’s LLM and embeddings, while the A40 is used for vector search (FAISS) and application deployment (Gradio).

Additionally, they would like to confirm whether they can use the same type of NIM as the one used in InsightEngine.
This is because, while their current NIM setup requires GPUs, InsightEngine does not.
They suspect that some aspects of NIM, such as TensorRT, might have been modified.

Best Regards,