Burger menu

Converting Speech and Text into Real Material Objects

With the rapid development of Generative AI, speech prompts could be enough to produce objects

Is It Possible to Transform Speech or Text to Real Material Objects?

Demonstration of turning a verbal request into a stool with the help of AI

Transforming speech into objects is rather an elusive concept, which nevertheless, begins partly turning into reality with the GenAI’s advent. The idea can be materialized through a synergistic effect of several AI types put together:

  • Speech recognition. Required to let the generative model capture a spoken request.
  • Text processing. At this stage the request is deciphered by a Large Language Model that retrieves data from the semantic content.
  • 3D modelling. Generative Engineering Model (GEM) is the next step as it converts natural language input into a detailed design suitable for printing.
  • 3D printing. The concluding stage. At which an object is created according to a synthetically generated blueprint.

Even though the concept may seem rather futuristic, several AI models are capable of processing textual content to create object designs — based on Transformer or Diffusion models, they can be enhanced with such know-hows as iterative sampling, estimation of distribution, hierarchical token sequences, and so on.

Speech-to-reality framework

The Concept of Industry 6.0

Overview of the Industry 6.0 integration

Industry 6.0 is a futuristic scenario, in which the production cycle almost entirely excludes human interference — the only exception is the starting point when a request is given to the roboticized system.

Industry 6.0 implies that GenAI has a high potential for developing solid decision-making capability and succeeding even without human guidance. The operations will be handled by a swarm of robots — from autonomous machines to drones — equipped with individual intelligence.

The production pipeline starts with 2D signed distance function (SDF) for outlining the object’s geometry, then it is converted into a 3D stereolithography file (STL), which in turn is translated into the G-code to initiate 3D printing. According to the authors' estimation, the proposed system outperforms human developers by an improvement factor of 4.4. 

Transforming Speech to Real Material Objects

There is another concept that shows how verbal commands can be  turned into real objects. It consists of:

  1. System Framework

The core of the system includes:

  • Speech recognition and language-processing models.
  • Generative model for creating a mesh.
  • Voxelization component to turn the mesh into building blocks (voxels).
  • Assembly phase when voxels are placed on their coordinates.

Each stage requires a separate GenAI solution. 

  1. System Hardware

The physical part of the system includes cuboctahedron-shaped voxels that can be assembled from any direction, a 6-axis robotic arm equipped with indexers for better alignment, and a conveyor belt. 

Voxel geometry
  1. System Implementation

The key implementation factor is to find balance between speed and failure prevention. It is suggested that the voxel dispenser should be set up first to achieve alignment and avoid collisions. Then the speed of the robotic hand movement should undergo precise calibration.

Converting Speech to Virtual Object Control

Some object manipulation techniques based on speech are proposed.

  1. Objects Selection in Virtual Reality

Picture: The object manipulation challenge in the VR setting with three levels of perplexity

It is suggested that objects can be moved with verbal commands in a simulated reality. For that purpose a training set is prepared, which contains:

  • Utterances. Refer to sizes, shapes, colors and features of objects.
  • Intents. They imply manipulation commands.

The Azure text-to-speech tool is used for interpreting human requests.

Example of the assembly sequence
  1. Text Selection in Virtual Reality

An experiment put to competition three hands-free selection methods manipulation in VR: Blink that focuses on blinking as a command signal, Dwell that does the same with gazing, and Voice that processes standard voice requests. The test, featuring 24 participants, showed that blinking outperforms other methods in terms of precision and speed.

Try our AI Text Detector

Avatar Antispoofing

1 Followers

Editors at Antispoofing Wiki thoroughly review all featured materials before publishing to ensure accuracy and relevance.

Article contents

Hide

More from AI Generated Content

avatar Antispoofing Generative AI in Design, Engineering and Manufacturing

GenAI Usage in Manufacturing and Its Benefits Apart from producing multimedia, Generative AI has also been adopted to solve manufacturing-related…

avatar Antispoofing Spoofing in Geography and Countermeasures

What Is Geo-Spoofing? Geographical spoofing is a technique of hiding someone’s location or even counterfeiting a genuine GPS signal. It…

avatar Antispoofing History of AI Picture Generators

When Did the First AI Image Generators Appear? The earliest known AI image generator can be dated back to 1973…

avatar Antispoofing Use of Deepfakes in Arts and Culture

Are Deepfakes Used in Art? Generative AI is used for multiple positive purposes in arts, culture, and media. For example,…