The Ultimate Artificial Intelligence Dictionary to Help You Navigate Our Changing World

Are you tired of the confusing terms surrounding artificial intelligence? In the past year, AI-saturated products and services have become widely available, offering a variety of features that are often wrapped in difficult-to-distinguish terms.

Agent

An agent, in the context of artificial intelligence, is a model or program that can perform a task independently. Examples of agents range from smart home devices that control temperature and lighting, to sensors in robotic vacuums and self-driving cars, to chatbots like ChatGPT that learn and respond to user instructions. Independent agents performing complex tasks are often cited as examples of what the next leap in AI might look like.

AGI (Artificial General Intelligence)

AGI is a type of program or model that possesses the full intellectual capabilities of a human, that is, general intelligence. AGI has capabilities such as reasoning, common sense, abstract knowledge, and creativity. Essentially, it can perform tasks independently without human instruction. There is no true AGI yet, but experts believe it could be achieved in the near future (though opinions differ on when exactly). Companies like OpenAI, DeepMind, and Anthropic are committed to attempting to create AGI. See also: strong AI

Algorithm

An algorithm is a set of rules or instructions for a computer program to follow. In the context of artificial intelligence, algorithms are the building blocks of AI. Think of a process inside the human brain being broken down into a series of steps. Algorithms mimic this process by building a sequence of if-then statements.

Alignment

Alignment refers to how well AI succeeds in achieving goals that are not explicitly listed in an instruction or request. These goals can include accuracy, safety, and harm prevention. If AI is not aligned, it diverges from its intended uses and applications, meaning it provides incorrect or inappropriate answers. Alignment is a significant part of the ethical conversation because a model that does not have proper alignment has the potential to spread misinformation, create security threats, and share harmful or dangerous information.

Artificial Intelligence

Artificial intelligence is the overarching term for technology that can automate or execute certain tasks designed by humans. Recently, when people talk about AI (“AI will destroy humanity” or “AI will replace our jobs”), they are usually referring to AGI and generative AI. But artificial intelligence is a huge concept that encompasses many technologies that we have been using for years, such as algorithms that recommend content or products, self-driving cars, or voice assistants.

Black Box

Certain AI models are sometimes referred to as black boxes, meaning that users are unable to see or understand the internal workings of the technology. The black box problem has become particularly relevant in the generative AI conversation, as companies like OpenAI and Google are secretive about how they operate. Moreover, since generative AI operates somewhat independently, even the developers themselves do not fully understand how the algorithm generates results. With ethicists and policymakers calling for more accountability and transparency from AI companies, the importance of opening black boxes has increased.

Chatbot

A chatbot is a type of program or model that can chat with a human, like ChatGPT, but the term can also refer to customer service bots that provide alternatives to speaking with a customer service representative via phone or text. Chatbots like ChatGPT, Bard, Bing, and Character.AI have garnered attention for their ability to engage in sophisticated human-like conversations with users, but chatbots have been around for quite some time. ELIZA is considered the first chatbot, developed in 1966 by computer scientist Joseph Weizenbaum at the Massachusetts Institute of Technology.

Learning

Deep Learning

Deep learning is a type of machine learning that mimics the way humans learn. It uses neural networks which incorporate multiple layers of algorithms to understand complex and abstract concepts, such as conversations and images. Applications of deep learning include facial recognition technology, chatbots like ChatGPT, and self-driving cars.

Diffusion Model

A diffusion model is a machine learning model that can generate responses similar to those it was trained on. Technically, it is a Markov chain trained using approximate inference. The terms Markov and approximate inference are mathematical terms used to predict sequences and approximate information within vast amounts of data. But what you need to understand is that diffusion models are what make AI image generation possible. Stable Diffusion, DALL-E from OpenAI, and Midjourney are examples of products using diffusion models.

Generative AI

With the launch of ChatGPT from OpenAI, generative AI has become part of the mainstream interface. Generative AI is a type of artificial intelligence that can create text, images, video, audio, and code based on user instructions. Generative AI typically takes the form of a chat interface, like ChatGPT, Bing, and Bard, allowing it to engage in back-and-forth conversations with users. The launch of ChatGPT sparked a lot of excitement as it was an easy and accessible way for people to understand and leverage the capabilities of generative AI. Despite its benefits, the widespread use of generative AI poses risks as it tends to hallucinate, or confidently create things that are incorrect.

Graphics Processing Unit (GPU)

A graphics processing unit is a powerful chip or graphics card capable of processing multiple complex computations. GPUs were initially developed for image and graphics processing, as the name suggests, but have been adapted for AI due to their ability to handle the immense computational power required in machine learning. It is estimated that ChatGPT uses 20,000 GPUs and will eventually need 30,000 graphics cards to run its model.

Hallucination

Generative AI, especially text-based chatbots, tends to make things up. It is described as “hallucination,” because generative AI sometimes goes off track completely, confidently speaking about something that is incorrect. For example, an AI chatbot might hallucinate by saying that Steve Jobs was a magician nearby, famous in Las Vegas during the Rat Pack era. But more commonly (and concerningly), AI chatbots may lightly hallucinate by blending facts with fiction. For instance, it may state that Steve Jobs was the co-founder of Apple (true), who headed the launch of the iPhone (true), who was named Time magazine’s Person of the Year (false).

Chatbot Jailbreaking

Chatbot jailbreaking refers to making it do something outside its intended uses. With a certain type of instruction, jailbreaking can allow a user to bypass rules or barriers, essentially tricking it into doing something it’s not supposed to do according to the model’s alignment. Jailbreaking can range from making the chatbot say offensive or inappropriate things just for fun, to sharing dangerous and actionable information, such as how to make napalm.

Large Language Model (LLM)

A large language model is an AI program trained on vast amounts of data to understand and generate text. Large language models connect sentences together by predicting the next word based on probability. Because large language models are trained using so much data – essentially everything on the internet – they are very successful at generating human-like text using this method. OpenAI’s GPT models, Google’s PaLM models, and Meta’s Llama models are examples of large language models. GPT-3.5 and GPT-4 power ChatGPT, and PaLM 2 and Bard.

Data

Licensed Data

Licensed data refers to information from the web that is purchased or accessed by a company or organization for the purpose of training artificial intelligence. You may hear about instances where companies claim they have trained their models using licensed data. This means the data has been obtained legally. The issue of licensed data has arisen frequently recently due to the massive amounts of data required to train AI models like ChatGPT. It becomes legally ambiguous due to the debate over what constitutes public domain, the original intent of the creator, and how companies should allow the use of that data.

Machine Learning

Machine learning is a method within artificial intelligence where a model is trained on data to learn and improve over time. Machine learning models use data to recognize patterns, classify information, and make predictions. Examples include spam filtering (classification learning), using housing data to predict house prices (regression learning), or identifying images of dogs (deep learning). The terms artificial intelligence and machine learning are often used interchangeably, but machine learning is a subset of artificial intelligence, defined as being trained on data to build its intelligence.

Model

You may have heard this term used frequently in relation to artificial intelligence. A model is a program or algorithm designed for a specific purpose. The term artificial intelligence model is a general term for a program designed to replicate and/or automate certain tasks.

Natural Language Processing (NLP)

The reason ChatGPT’s responses can sound oddly human-like is due to natural language processing. The term refers to training the model on text and speech so that it understands and expresses itself like a human. Natural language processing also involves researching linguistics so that models can grasp the complexities and nuances of language.

Neural Network

Inspired by the way the human brain works, neural networks are algorithms made up of “artificial cells” or “nodes” that communicate with each other. Each connection between two neural cells carries a certain value or “weight.” These weights evaluate certain inputs automatically, and the cell will be “activated” if a certain threshold is reached, passing information to other cells in the network. Neural networks are key to enabling deep learning.

Open Source

Open source means that the source code of a software program is open and free to the public (while a closed box remains secret). This allows developers to use, modify, and build their own products upon it. An open-source artificial intelligence model is viewed as a route to democratize AI development, which is often shrouded in secrecy. Unlike Google and OpenAI models that are closed source, Meta recently released an open-source LLM model (Llama 2). Other open-source models include Falcon, MPT, and RedPajama.

Parameter

A parameter is a variable in an LLM model that can be weighted or tuned during training to determine a specific outcome. You may have heard about parameters in relation to the strength of an LLM model – for example, GPT-4 has 1.7 trillion parameters. The more parameters in an LLM model, the greater its complexity and the more capable it is of learning.

Prompt

A prompt is a request or question sent by the user to the chatbot. There is a whole subculture dedicated to ensuring the best response from large language models. Whether it’s generating code, breaking the mold, or simply getting the best answer you’re looking for, prompts rely on clarity, conciseness, context, and intent.

Prompt Engineer

With the rise of productivity chatbots, there is an abrupt demand for experts in crafting the right prompts. This is where a prompt engineer comes in. A prompt engineer is someone with deep knowledge of large language models who can optimize the best prompts for various purposes. This can be through ensuring the chatbot successfully understands the request, or probing the model to find threats and vulnerabilities.

Attack

Instruction Injection

The emergence of large language models has led to a new type of cyber attack called instruction injection attacks. Instruction injection, similar to jailbreak, is the use of a carefully crafted instruction to manipulate models like ChatGPT for malicious purposes. By injecting instructions, hackers exploit a vulnerability within the chatbot to share confidential information or bypass the model’s barriers. Attackers directly use this by interacting with the chatbot or indirectly by embedding an instruction within a plugin or webpage to secretly access personal information or payments.

Algorithm / Recommendation System

Before the advent of ChatGPT, artificial intelligence was a significant part of our lives. One of the most prevalent examples of this is the recommendation algorithm. It is a term that refers to a machine learning algorithm that provides recommendations based on user data and behavior. Examples include recommended shows on Netflix, products on Amazon, videos on YouTube and TikTok, and posts on Instagram.

Strong AI

Strong AI is another term for AGI or artificial general intelligence. It is a theoretical (so far) form of artificial intelligence that can “think” and act independently like a human.

Token

A token is a unit of information within a large language model. It can be a word, part of a word, punctuation mark, or a piece of code – anything that is the basic form of something that has meaning. When you hear that an LLM model was trained on a certain number of tokens or that a pricing model costs a certain number of cents for every 1,000 tokens, this is what it refers to.

Training

Training is the process of feeding data into a machine learning model. There are two types of training: supervised learning and unsupervised learning. The model is trained under supervised learning using data that has been labeled or classified in some way, while unsupervised learning uses unlabeled data forcing it to learn patterns and connections on its own. Each type of training has its own strengths and weaknesses. LLM models like GPT-4 use a mix of unsupervised and supervised learning.

Training Data

Training data is the data that is used to train
Source: https://me.mashable.com/tech/35686/the-ultimate-ai-glossary-to-help-you-navigate-our-changing-world