This page was heavily inspired by UIUC's Gen AI guide, especially the textual explanations.
Artificial Intelligence (AI) is a way to train computers to carry out a range of complex tasks and to learn from these tasks. It uses a process called machine learning to identify patterns in images, texts, and other materials. This process can involve human input or be fully automated. Generative Artificial Intelligence (genAI) is a specific kind of AI that is designed to generate new text, images, video, or audio based on the content that it has studied.
In order for the computer to generate new material, it needs to:
Below, you can see how this works for text, images, and sound.
Tools like ChatGPT are GPTs, or generative pretrained transformers.
They’re generative in that they’re designed to generate new text, which is awesome if you’re trying to write a song about your dog's dinnertime, but not so great if you’re trying to quote and cite published scholarly articles.
These tools are pre-trained. By the time you interact with an AI tool, it’s already done all of its homework. It has studied large amounts of text authored by humans in order to understand the relationship between individual words. It knows that the words "Chesapeake" and "Bay" are often together, referring to a place, but "Bay" is also often in tandem with Old Bay, referring to the spice.
The computer’s understanding of words and their relationships is called a language model. AI models are trained on hundreds of thousands of texts, so we refer to these as large language models or LLMs.
Instead of looking for relationships between words, computers “see” an image by looking at pixels, the basic building blocks of digital images. The computer will compare one pixel to the ones that surround it, paying close attention to colors, outlines, and texture. During the training process, the computer learns how to identify parts of an image that are important (feature recognition) and then categorize them (classification). Tools like Adobe Acrobat and ABBY Fine Reader use optical character recognition to identify the shapes of letters and create transcriptions of text.
If you have ever done a CAPTCHA, you’ve helped a computer identify features and classify them. We refer to human intervention in the training process as “supervised learning,” but computers can also engage in “unsupervised learning” by checking their work without having to ask a human. The image to the right, which features both Chihuahuas' faces and blueberry muffins, is a classic example of something that might trip up a computer.
Surprisingly, AI uses image recognition techniques to identify and recreate sounds. The two aspects of sound that are most important for generative AI are frequency (Is this a high-pitched screech or a low growl?) and amplitude (Is this a whisper or a shout?). The computer visualizes these two aspects of sound as a spectrogram–it represents frequencies vertically along the y-axis and uses a sliding scale of colors to represent the intensity of the amplitude. From there, the computer uses visual pattern recognition to identify important features and then classify them. This technique allows the computer to isolate the rumble of a plane flying overhead from an umpire’s whistle because these sounds form different patterns on the spectrogram. Once it recognizes these patterns, it can then use the information about frequency and amplitude to create sounds in new combinations.