I am interested in learning more about VAST InsightEngine and its capabilities.
InsightEngine seems very promising for future AI engineering, particularly in areas like LLMs (Large Language Models) and Generative AI. I would like to start using it as soon as it becomes available.
I have two specific questions about InsightEngine:
System Architecture:
Could you provide details about the system architecture of the C-Box when used with InsightEngine?
I understand that InsightEngine utilizes NIM or Nemo. My current understanding is that GPUs are required for running NIM or Nemo. Does the new C-Box include GPUs to support this functionality?
Role in AI Engineering:
What role does InsightEngine play in AI engineering workflows? Specifically, how does VAST Data, in combination with InsightEngine, compare to traditional vector databases like Milvus or Chroma? Could VAST Data potentially replace them in certain use cases?
1a (CBOX): during our beta phase, there are no HW changes. Existing IceLake & AMD Epyc based (CPU only) systems will be used. For services during beta which require GPUs, partners and customers will allocate/provision one or more of the following:
a. GPU systems which can be added to a k8s cluster. VAST is deploying a series of services on k8s systems which allow bi-directional communication with a VAST Cluster, such that the VAST control plane can monitor and manage certain types of services (this is evolving as we iterate on our codebase).
b. GPU systems which are separate from k8s and are 100% customer managed. Interaction with models deployed on those GPU servers will occur via configuration on the pipelines which customers define on their VAST cluster,. For example, if an NVIDIA NIM/Model is required for inference, and the model is hosted on an existing, non-managed GPU server, a customer could set an ENVIRONMENT_VARIABLE on their VAST managed pipeline to send inference calls to a defined model endpoint (eg: https://mygpu.client.com/v1/...)
AI engineering â It seems like your question is more related to VectorDBâs than the broader scope of âai engineeringâ. VAST has already implemented a large scale database platform. Whatâs missing in current GA code is support for the types of data structures and query/search optimizations typically associated with searching for vector embeddings. We are in the process of creating these as extensions to our existing Database, and will be launching initial support for using VAST as a native vector store later this year.
The âshortâ answer is âyes, VAST could potentially replace Milvus, Chroma, etcââŚonce we complete our R&D effort
Thank you for your prompt reply.
I now have a clear understanding of the matter.
I am a member of the distribution company which handle VAST DATA.
As one of the distributors, we are preparing to utilize InsightEngine for demonstration purposes.
This demonstration has the potential to greatly appeal to our customers who are developing or utilizing AI technologies.
Therefore, we would appreciate receiving detailed information about InsightEngine as soon as possible.
if no k8s, please describe your GPU setup (e.g. system config, system topology including network, and current sw deployment , i.e., GPU aware container instances running on ⌠?
I apologize for replying late.
I asked our application engineering team.
They are using Run:AI to deploy multiple NIM containers and verify the RAG pipeline on GPUs.
The GPUs are located in our companyâs data center, and for the RAG pipeline, they are using three H100s and one A40.
The H100s handle NIMâs LLM and embeddings, while the A40 is used for vector search (FAISS) and application deployment (Gradio).
Additionally, they would like to confirm whether they can use the same type of NIM as the one used in InsightEngine.
This is because, while their current NIM setup requires GPUs, InsightEngine does not.
They suspect that some aspects of NIM, such as TensorRT, might have been modified.