Article • • 8 minute read
Artificial Intelligence (AI) Glossary of Terms and Key Positions
In this article
Key terms in AI and data
- Artificial intelligence (AI): The simulation of human intelligence processes by machines or computer systems.
- AI ethics: Refers to the moral principles and guidelines that govern the design, use and impact of AI. It includes considerations of fairness, transparency, privacy, bias and accountability in AI systems.
- AI solutions: A broad term that includes generative AI, machine learning, computer vision, NLP, MLOps, AIOps, IoT, digital twins and more.
- Algorithm: A sequence of rules given to an AI machine to perform a task or solve a problem.
- AI Operations and Machine Learning Operations (AIOps/MLOps): Disciplines spanning an organization's people, processes, culture and technology that give IT teams actionable insights by injecting intelligence via machine learning and cross-domain, unified observability into IT Operations.
- Application programming interface (API): Protocols that determine how two software applications interact with each other.
- Bias: The tendency of AI systems to make unfair decisions due to flawed data or algorithms, often reflecting existing societal prejudices.
- Big data: Large datasets studied to reveal patterns and trends for business decisions.
- Chatbot: A software application designed to imitate human conversation through text or voice commands.
- Cognitive computing: Essentially synonymous with AI, cognitive computing focuses on mimicking human thought processes like pattern recognition and learning.
- Computer vision: Capture, process and analyze digital imagery and other high-dimensional data points.
- Corpus: Essentially, the training data. A collection of machine-readable text structured as a dataset.
- Data poisoning: A type of cyber attack where an attacker manipulates or injects false data into a data set to corrupt or degrade its quality.
- Deepfake: A synthetic media where a person's likeness is replaced with someone else's using AI. The realistic images, audio and videos can be hard to distinguish from authentic ones.
- Digital twins: A digital representation of a physical system, product or process for testing.
- Few-shot prompt language model: A type of language model designed to generate text or provide responses based on a limited amount of input or context.
- Generative AI (GenAI): Generative AI, also known as GenAI or GAI, creates text, images and more through training on data patterns and structures.
- Generative pre-trained transformer (GPT): A type of LLM trained on a large corpus that uses a transformer neural network to generate text as a response to input.
- Graphics processing unit (GPU): A hardware component designed for fast processing of graphics and other parallelizable tasks. GPUs are crucial for AI because can perform many simple operations simultaneously, making them highly efficient for training machine learning models and processing large datasets.
- Hallucination: The phenomenon where an AI model generates or interprets data inaccurately, often creating outputs that seem bizarre or nonsensical to humans. Often the result of the model's inherent biases or limitations in its learning algorithms.
- High-performance architectures (HPA): Consisting of high-performance computing (HPC), high-performance networking and high-performance storage components, HPA can handle high processing loads and run optimized software all on a robust network infrastructure that ensures fast and reliable data transmission. Other components of HPA may include load balancers for distributing network or application traffic across many resources, databases designed for quick data retrieval and storage, and caching mechanisms to reduce data access time.
- Inference: Testing the model. Feeding new data to the model to get its responses/predictions.
- Inference-only: A system using a pre-trained model to make predictions on new, unseen data, but not learning from this new data or updating its model parameters. It's only inferring outcomes based on what it has previously learned during the training phase
- Internet of things (IoT): IoT infrastructure feeds real-time data into AI models while AI models analyze, improve and enhance the output from IoT sensors and devices.
- Jailbreak: In the context of AI, jailbreaking means manipulating chatbots into bypassing restrictions.
- Langchain: A language model developed by OpenAI. It's designed to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations. It generates human-like text based on the input it receives, ensuring responses are coherent and relevant to the topic at hand.
- Large language model (LLM): A type of AI algorithm that leverages deep learning techniques to process natural language to understand, summarize, predict and generate content. The largest models are trained on billions of parameters.
- Machine learning (ML): Uses data and algorithms to imitate, without human intervention, how humans learn.
- Model theft: The unauthorized extraction or copying of machine learning models.
- Natural language processing (NLP): A machine's ability to process and understand human language, leveraging syntax and semantic analysis techniques such as parsing, segmentation, stemming, recognition, regeneration, disambiguation, etc.
- Neural network: A computing model inspired by the human brain. It consists of interconnected layers of nodes, or "neurons," that process and transmit information to solve complex tasks like pattern recognition and data classification.
- Parameter efficient fine-tuning (PEFT): A technique used to fine-tune large LLMs in a resource-efficient manner. It involves modifying a minimal number of parameters while keeping the majority of the model's parameters frozen.
- Parameters: The weights or variables used to train a target model. For example, 187 billion parameters were used to train ChatGPT.
- Practical AI: WWT's proven methodology for creating AI solutions that are precisely tuned to specific business needs, accelerating organizations' ability to compare, validate and integrate AI solutions effectively, and to assess the maturity of IT infrastructure needed to support data-intensive AI solutions. This approach rests on three pillars: building AI products; rapid testing and enablement; and scaling AI foundations through high-performance architectures.
- Prompt engineering: The process of designing and optimizing prompts to effectively guide a machine learning model's responses, with the goal of improving model performance and accuracy.
- Prompt injection: When an end user provides specific instructions (prompts) to an AI model that cause it to bypass the developer's intended instructions.
- Reinforcement learning (RL): A feedback-based machine learning paradigm where the model/agent learns to act in an environment to maximize a defined reward.
- Reinforcement learning from human feedback (RLHF): A technique that trains a reward model directly from human feedback and uses the model as a reward function to optimize an agent's policy using RL.
- Responsible AI: Refers to the practice of designing, building and deploying AI in a manner that is transparent, accountable and fair. It emphasizes the need for AI systems to respect user privacy, provide explainability and prevent discriminatory outcomes.
- Retrieval augmented generation (RAG): A method that leverages a large external knowledge source to augment the generation of machine learning models, enhancing their ability to provide detailed and accurate responses.
- Role-based access control (RBAC): A system that restricts network access based on the roles of individual users within an organization. It ensures security by only granting the access necessary for users to fulfill their roles.
- Transformer: The algorithm behind LLMs. A deep learning model adopting the attention mechanism that learns different weights and the significance for each part of the input data in a robust manner.
- Tokens: A unit of input text. A token is the smallest semantic unit defined in a document/corpus (not necessarily a word). ChatGPT, for example, has a 4,000 token limit. GPT-4 permits up to 32,000 tokens.
- Vector: The numerical representation or list of numbers representing different aspects of a word or phrase.
Key roles in AI and data
- AI engineers: Build and implement AI models; responsible for maintaining AI infrastructure.
- Automation engineers: Leverage machine learning algorithms to design, implement and enable systems to learn from data, improve over time and make decisions. They enhance efficiency, reduce errors and enable the automation of complex tasks in various industries.
- Cloud architects: Manage the cloud computing architecture, an essential component for scalable AI solutions.
- Data analysts: Collect, process and interpret complex data sets to help businesses make informed decisions. They use statistical techniques, create reports and present findings to stakeholders.
- Data engineers Format raw data for analysis, manage downstream data, and build systems to make large volumes of data available to an organization. They also prepare the "big data" infrastructure for data scientists to analyze.
- Data scientists: Analyze and interpret complex digital data to assist in decision-making processes.
- IT architects: Design the overall structure of high-performance architectures (HPA), including the software, hardware and networks necessary to power AI solutions.
- IT security specialists: Ensure the security of AI systems, protecting sensitive data and systems from cyber threats.
- Machine learning engineers: Design and build machine learning systems, applying predictive models and utilizing natural language processing.
WWT's proprietary, internally developed GPT was used to help create this content.
Stay up to date on all things AI and data!
Keep learning