Creating Hyper-Realistic Lifelike Digital Human Assistants

0:00

In the quest to innovate and enhance user experiences, developers are increasingly turning to generative AI to accelerate content creation pipelines and unlock new interactive possibilities. A significant focus area is the development of digital humans for applications in customer service, virtual factories, virtual presence, and gaming. Historically, the creation of realistic digital humans has presented myriad challenges. Achieving lifelike realism involves meticulously capturing the nuances of human anatomy, facial expressions, and movement. Animating these digital personas to exhibit natural behaviors requires advanced motion capture technologies. Furthermore, ensuring that these animations run smoothly in real-time is a computationally intense endeavor. To surmount these hurdles, development teams are leveraging generative AI to pioneer innovative methods for crafting interactive digital humans.

Introducing NVIDIA ACE: Revolutionizing Digital Human Creation

NVIDIA ACE (Avatar Creation Environment) is a comprehensive suite of technologies designed to empower developers to breathe life into digital humans. The suite includes an array of microservices known as NVIDIA NIM™. These microservices are not only easy to deploy but also optimized for high performance. They run seamlessly on the NVIDIA Graphics Delivery Network (GDN)—a global GPU network providing low-latency digital human processing across 100 countries—or on NVIDIA RTX™ AI PCs. Developers and system integrators can cherry-pick individual NIMs from the ACE technology suite and integrate them into their custom platforms.

Image Credit : Nvidia

Live Demo : https://build.nvidia.com/nvidia/digital-humans-for-customer-service

Users can interact with a 3D digital avatar powered by ACE—a system capable of connecting with people through emotions, humor, and more.

Meet James: A Real-Time Virtual Assistant

Built atop NIM microservices, James is a virtual assistant providing contextually accurate responses. You can interact with James in real-time at ai.nvidia.com

Leveraging AI for Real-Time Language Understanding, Speech, Animation, and Graphics

NVIDIA ACE incorporates cutting-edge graphics and simulation technologies for every aspect of digital human creation—from speech and translation, vision, and intelligence to realistic animation and behavior, to lifelike appearances.

NVIDIA Riva: This technology enables digital humans to comprehend human language, translate responses across up to 32 languages, and deliver natural replies.

NVIDIA Nemotron: A suite of large language models (LLMs) and small language models (SLMs) endows digital humans with intelligence, facilitating contextually aware and humanlike conversations.

NVIDIA Audio2Face™: With dynamic facial animation and precise lip-syncing capabilities, this technology allows 2D or 3D avatars to animate realistically from just an audio input.

NVIDIA RTX™: A collection of rendering technologies that provides real-time, path-traced subsurface scattering to mimic the way light penetrates the skin and hair, enhancing the realism of digital humans.

Building the Future of Interactive Avatars

Developers have the flexibility to create their bespoke solutions utilizing NVIDIA’s digital human technologies or implement NVIDIA’s domain-specific AI workflows. These workflows are tailored for next-generation interactive avatars in customer service, humanoid robots in virtual factories, virtual presence applications, or AI-powered non-player characters in gaming. Given that generative AI models are both compute and memory-intensive, especially when running concurrent AI and graphic processes, a robust GPU equipped with dedicated AI hardware is essential. ACE’s flexible architecture allows models to be executed across cloud and PC environments, depending on local GPU capabilities, ensuring an optimal user experience.

By harnessing cutting-edge AI and graphics technologies, developers can significantly enhance the realism and interactivity of digital humans, opening the door to more immersive and engaging digital experiences.

NIM Agent Blueprint: Digital Human for Customer Service

Image Credit : Nvidia

Github Code : https://github.com/NVIDIA-NIM-Agent-Blueprints/digital-human

Create a digital human for customer service by integrating NVIDIA NIM, ACE Microservices, Omniverse RTX rendering, and NeMo Retriever.

This blueprint repository is designed to help developers demonstrate how a Large Language Model (LLM) or a Retrieval-Augmented Generation (RAG) application can be easily integrated into a digital human pipeline. The digital human and the RAG applications are deployed independently. The RAG application generates the text content for interactions, while the Tokkio customer service workflow enables live avatar interaction. These two components are separate but communicate via REST API. Developers can customize and optimize the application according to their specific needs. The workflow also includes steps to set up and connect both components of the customer service pipeline.

Get Started

Prerequisites
- Ensure that system requirements are fulfilled.
- Setup NVIDIA GPU Cloud (NGC) API key, Cloud Service Provider and SSH Key Pair.
Deploy
- Start by launching the Digital Human Pipeline Deployment to interact with the digital human.
- Next, deploy the RAG pipeline to connect the digital human to a knowledge base.
Customize
- Connect your Digital Human Pipeline to domain-adapted RAG.
- (Optional) Further customize with Parameter Efficient Fine-Tuning (PEFT) and Low-Rank Adaptation (LoRA).
Evaluate
- Assess RAG and PEFT using metrics like ROUGE, BLEU, Ragas, and LLM-as-a-judge.

Prerequisites

1. System Requirements:

Access to an Ubuntu 20.04 or 22.04 based machine, either VM or workstation with sudo privileges for the user to run the automated deployment scripts.
Python version 3.10.12 or later

1.1 Docker Installation

Install Docker Engine and Docker Compose.

1.2 NVIDIA GPU Driver Version

Verify NVIDIA GPU driver version 535 or later is installed.

    $ nvidia-smi --query-gpu=driver_version --format=csv,noheader
    535.129.03

    $ nvidia-smi -q -d compute

    ==============NVSMI LOG==============

    Timestamp                                 : Sun Nov 26 21:17:25 2023
    Driver Version                            : 535.129.03
    CUDA Version                              : 12.2

    Attached GPUs                             : 1
    GPU 00000000:CA:00.0
        Compute Mode                          : Default

Refer to the NVIDIA Linux driver installation instructions for more information.

1.3 NVIDIA Container Toolkit

Install the NVIDIA Container Toolkit.

Verify the toolkit is installed and configured as the default container runtime.

    $ cat /etc/docker/daemon.json
    {
        "default-runtime": "nvidia",
        "runtimes": {
            "nvidia": {
                "path": "/usr/bin/nvidia-container-runtime",
                "runtimeArgs": []
            }
        }
    }

    $ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -L
    GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-d8ce95c1-12f7-3174-6395-e573163a2ace)

1.4 GPU Requirements

For minimum GPU requirements, refer to: https://build.nvidia.com/nvidia/digital-humans-for-customer-service/blueprintcard

2. NVIDIA GPU Cloud (NGC) API key:

NGC API Key is required to access resources within this repository.

Navigate to https://build.nvidia.com/nvidia/digital-humans-for-customer-service and click “Download Blueprint”
Login / Sign up if needed and “Generate your API Key”
Use this API Key as credentials for “NGC_API_KEY”
Log in to the NVIDIA container registry using the following command:docker login nvcr.ioOnce prompted, you can use $oauthtoken as the username and your NGC_API_Key as the password.Then, export the NGC_API_KEYexport NGC_API_KEY=<ngc-api-key>

Refer to Accessing And Pulling an NGC Container Image via the Docker CLI for more information.

3. Cloud Service Provider Setup:

The Digital human for customer service blueprint includes easy deployment scripts for major cloud service providers, it is recommended to have CSP secrets handy to deploy the digital human application. The RAG application can be deployment locally with docker compose using provided customization and deployment scripts.

We will cover the digital human application setup and deployment steps for AWS. The Setup for other CSPs can be found here.

We will be leveraging the one-click aws deployment script for deployment that automates and abstracts out complexities and completes the AWS instance provisioning, setup and deployment of our application.

3.1 Digital Human Application – AWS Setup:\

Follow the AWS CSP Setup Guide to configure your AWS environment for the Tokkio application.

After going through the provisioning steps, you should have the following credentials:

AWS Access Keys for IAM user: This procurement will give the Access key ID and Secret access key credentials. Refer to the AWS Documentation for detailed instructions.
S3 Bucket: Private S3 bucket to store the references to the resources the one-click deploy script will spin up.
DynamoDB Table: To manage access to the deployment state.
Domain and Route53 hosted zone: To deploy the application under.

4. SSH Key Pair:

This is needed to access the instances we are going to setup. On the local Ubuntu based machine you may use existing SSH key pair or create a new SSH key pair:

ssh-keygen -t rsa -b 4096

This should generate a public and private SSH key pair. The public key should be available as .ssh/id_rsa.pub and the private key would be then available as .ssh/id_rsa in your home folder as well. These keys will be needed to setup your one-click deployment of the Digital Human Pipeline.