I apologize for replying late.
I asked our application engineering team.
They are using Run:AI to deploy multiple NIM containers and verify the RAG pipeline on GPUs.
The GPUs are located in our company’s data center, and for the RAG pipeline, they are using three H100s and one A40.
The H100s handle NIM’s LLM and embeddings, while the A40 is used for vector search (FAISS) and application deployment (Gradio).
Additionally, they would like to confirm whether they can use the same type of NIM as the one used in InsightEngine.
This is because, while their current NIM setup requires GPUs, InsightEngine does not.
They suspect that some aspects of NIM, such as TensorRT, might have been modified.
Best Regards,