This page is still very preliminary and will see extensive modifications soon, but here are a few initial notes.
Coarse Taxonomy of AI
This is a very preliminary taxonomy that still has significant gaps that I will fill in the near future.
Traditional Machine Learning
- Supervised or Semi-Supervised Learning
- Linear Models
- Linear and Quadratic Discriminant Analysis
- Kernel Ridge Regression
- Support Vector Machines
- Stochastic Gradient Descent
- Gaussian Processes
- Cross Decomposition
- Naive Bayes
- Decision Trees
- ID3, C4.5 C5.0
- Classification And Regression Tree (CART)
- Multivariate Adaptive Regression Splines (MARS)
- Classification
- Support Vector Machines (SVMs)
- Naive Bayes
- Nearest Neighbors
- Decision Trees
- Random Forests
- Regression (Linear, LogOrdinal, Poisson, Fast Forest Quantile, …)
- Unsupervised Learning
- Self-Supervised Learning
- Reinforcement Learning (own category below)
- Loss Functions
- Metrics
- Classification Metrics
- Precision
- Recall
- Accuracy
- F-Score – information retrieval, machine learning
- F1
- Receiver Operating Characteristic (ROC) Curve
- Area Under Curve (AUC)
- Regression Metrics
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Inlier Ratio Metric
- Computer Vision Metrics
- Pixel Accuracy
- Intersection-Over-Union (IoU, Jaccard Index)
- mean IoU (mIoU) – mean since average over multiple classes, e.g. segmentation
- Peak Signal-to-Noise Ratio (PSNR), e.g. reconstruction quality of videos and images under lossy compression
- Structural SIMilarity (SSIM)
- NLP Metrics
- BLEU Score
- Inter-Annotator Agreement (IAA)
- LEPOR (Length Penalty, Precision, n-gram Position difference Penalty and Recall) – translation
- Metric for Evaluation of Translation with Explicit ORdering (METEOR) – translation
- NIST, machine translation
- Perplexity
- Recall-Oriented Understudy for Gisting Evaluation (ROUGE) – summarization, translation
- Word Error Rate (WER) – ASR, translation
- Ranking Metrics
- Cumulative Gain
- Discounted Cumulative Gain (DCG)
- Normalized Discounted Cumulative Gain (NDCG)
- MAP
- Precision@k
- Mean Reciprocal Rank (MRR)
- Kendall’s tau
- Spearman’s rho
- Statistical Metrics
- Coefficient of Determination
- Pearson Correlation Coefficient
- Spearman’s Rank Coefficient
- p-value
- More
- Sensitivity & Specificity (also TP, FP, TN and FN Rate)
- Logarithmic Loss
- Confusion Matrix (not metric itself, but related tool)
- Root Mean Squared Error
- Relative Absolute Error
- Relative Squared Error
- Classification Metrics
- Distances
- Bhattacharyya Distance
- Cosine Similarity
- Earth Mover’s Distance
- f-divergence
- Hellinger Distance
- Kullback-Leibler Divergence
- Pearson χ2-divergence
- Total Variation Distance
- Jeffreys distance
- Mahalanobis Distance
- Minkowski Distance
- Signal-to-Noise Ratio Distance
Deep Learning
- Topology
- Fully connected
- Convolutional Neural Networks (CNN)
- Recursive Neural Network
- Backprop Through Structure (BPTS)
- Recurrent Neural Networks (RNN)
- Bidirectional RNN
- LSTM (bidirectional or unidirectional)
- Grid LSTM
- Stacked LSTM
- Tree LSTM
- Hopfield Net
- Backprop Through Time (BPTT) – training technique for certain kinds of RNNs
- Stochastic RNNs
- Memory Augmented Neural Network Architectures (MANN)
- Memory Networks, Neural Turing Machines (NTM), Differentiable Neural Computers (DNC)
- Skip connections (residual nets, highway networks, etc.)
- Radial Basis Function (RBF)
- Perceptron
- Feed Forward (FF)
- Deep Feed Forward (DFF)
- Auto Encoder (AE)
- Variational AE (VAE)
- Denoising AE (DAE)
- Sparse AE (SAE)
- Graph Neural Network (GNN)
- Recurrent Graph Neural Networks
- Spatial Convolutional Networks
- Spectral Convolutional Networks
- GAN: Vanilla GAN, DCGAN (Deep Convolutional GAN), CGAN (Conditional GAN), LAPGAN (Laplacian Pyramid GAN), SRGAN (Super-Resolution GAN), DiscoGAN, Info-GAN etc. – see GAN Zoo for huge list
- Activation Function
- Binary Step
- Exponential Linear Unit
- Linear
- Parametric ReLU (PReLU)
- ReLU
- Leaky ReLU
- Parameterized ReLU
- Sigmoid
- Softmax
- Swish
- Tanh
- Rectifiers
- Gating Mechanisms
- Gated Recurrent Units (GRU) – gating mechanism in RNNs
- Self-normalizing
- Weight Initialization
- He Initialization
- Random Initialization
- Xavier Initialization
- Zero Initialization
- Random
- Regularization
- L1
- L2
- Batch Normalization
- Data Augmentation
- Dropout
- Early Stopping
- Pre-Training
- Modifiers
- Normalization
- Batch norm
- Layer norm
- Instance norm
- Pruning
Reinforcement Learning
- Value Optimization (DQN, DDQN, Rainbow, …)
- Policy Optimization (A3C, DDPG, PG, PPO, …)
- Imitation Learning (BC, Conditional Imitation Learning)
- Hierarchical RL (HAC, …)
- Memory Type (HER, PER, …)
- Exploration Techniques (e-greedy, Boltzmann)
- Environments
- Curriculum Learning
- Loss Functions
Natural Language Processing (NLP)
- Symbolic, Statistical and Neural NLP
- Text and Speech Processing
- Automatic Speech Recognition (ASR)
- Speech Segmentation
- Text-to-Speech / Speech Synthesis
- Tokenization / Word Segmentation
- Morphological Analysis
- Lemmatization
- Morphological Segmentation
- POS Tagging (Part of Speech)
- Stemming
- Syntactic Analysis
- Grammar Induction
- Parsing
- Sentence Boundary Disambiguation
- Lexical Semantics
- Distributional Semantics
- Word Embeddings
- Sentence Embeddings
- Lexical Semantics
- Named Entity Recognition (NER)
- Sentiment Analysis
- Multi-Modal Sentiment Analysis
- Terminology Extraction
- Word Sense Disambiguation
- Distributional Semantics
- Relational Semantics
- Relationship Extraction
- Semantic Parsing
- AMR Parsing
- DRT Parsing
- Semantic Role Labeling
- Discourse
- Argument Mining
- Coreference Resolution
- Discourse Analysis
- Implicit Semantic Role Labeling
- Textual Entailment
- Topic Segmentation and Recognition
- High-Level NLP
- Dialog Management
- Grammatical Error Correction
- Machine Translation
- Neural Machine Translation (NMT)
- Statistical Machine Translation (SMT)
- Natural Language Generation (NLG)
- Natural Language Understanding
- Dialog Act Classification
- Slot Filling
- Question Answering
- Closed vs Open Domain
- Factoid
- Knowledge Base QA (KBQA)
- Chunking
- Entity Linking
- Information Retrieval
- Information Extraction
- Topic Modelling
Computer Vision
- Recognition
- Face Recognition
- Object Recognition
- Optical Character Recognition (OCR)
- Scene Recognition
- Object Identification
- Detection
- Viola-Jones Object Detection (usually face detection)
- Tracking
- Single Object Tracking
- BOOSTING
- CSRT
- GOTURN
- Kernelized Correlation Filters (KCF)
- MEDIANFLOW
- Multiple Instance Learning (MIL)
- Minimum Output Sum of Squared Error (MOSSE)
- Tracking, Learning and Detection (TLD)
- Multi-Object Tracking (MOT)
- Single Object Tracking
- Image Retrieval
- Pose Estimation
- Morphological Operations
- Segmentation
- DeepLab
- Felzenszwalb
- FCN
- Mask RCNN
- QuickShift
- SLIC
- Watershed
- Combination with RAG
- Feature Extraction
- FAST
- AGAST
- Sift
- Edge Detection
- Canny Edge Detection
- Descriptors
- SIFT Descriptors
- Speeded Up Robust Features (SURF)
- Superpixels
- Structure from Motion (SfM)
- Panorama Stitching
- Feature Matching
- Exhaustive Search
- RANdom SAmple Consensus (RANSAC)
Signal Processing & Pattern Recognition
- Analog
- Continuous Time
- Discrete Time
- Digital
- FFT
- Nonlinear
- Statistical
- Wavelets
- Haar Wavelets
- Voice Activity Detection
Optimization
Reasoning & Automated Planning (AP)
- Constraint Solvers
- Theorem Provers
- Logic Programs
- Rule Engines
- Business Rule Management Systems (BRMS)
- Deductive Classifiers
- Case-Based Reasoning
- Procedural Reasoning
Data Mining & Clustering
- Anomaly Detection
- Association Rule Mining
- Clustering
- Classification
- Regression
- Text Mining
- Automatic Summarization
- Extractive vs. Abstractive Summarization
- Document Summarization
- Aided Summarization (human in the loop)
- Keyphrase Extraction
- Supervised
- Unsupervised
Patterns of AI
To be extended.
Freezing Neural Net Layers
Freeze output layer and potentially layers adjacent to it to accelerate training, esp. for CNNs. The more similar the tasks the less layers need to be retrained. The idea is to leverage the knowledge already present in a net trained on similar tasks.
Multi-Head Neural Nets
The idea is to use a backbone network and multiple heads for specific tasks – the backbone will train to capture the overall information and the heads use them to form answers for their specific subtasks. For instance, in object detection it is often desirable to both localize the object (regression task) and to recognize or even identify it (classification task) which can be modeled with two heads.
Agent Actions via Event Sourcing / CQRS
Event Sourcing captures each state change of an application in an event object. It is a well-known cloud technique and for instance allows to easily reconstruct the state of a microservice after it died. The same technique can be used for AI agents: Reinforcement learning has techniques like replay buffers and hindsight experience replay which require resampling of past actions. It is also useful to automatically capture actions for debugging so developers can afterwards step through the agent’s run and belief state and the same resilience advantage from cloud applies, since agents are oftentimes trained in cloud environments.
Mashups / Smart Ensembling
There are many classifiers and cloud services covering the same functionality. For instance, there are many translation APIs and many object recognition neural networks. Ensembling itself is a well-known AI pattern, but here we also assume a mashup enabler which transforms resources to facilitate the combination of their results. For instance, we can align the formats of the translation APIs from multiple cloud vendors or of multiple neural nets. This can include additional selection, ranking and abstraction steps. For instance, we can select only question answering services that are specific to medicine, rerank their answers and abstract their output, i.e. if one system responds H1N1 and the other H3N2 we can still unify them as responses for Influenza A which is much more informative than discarding them.
Embeddings
Graph and language embeddings like word2vec, sentence2vec, doc2vec, node2vec, even more domain-specific embeddings like hotel2vec
Facebook has published on applying embedding based retrieval (EBR) to their product here.
Typical vector arithmetic examples exist like queen – woman + man = king or that the distances between countries and their capitals or words in one language and their translation in the other could be approximately the same. It seems to vary how much these properties exist in practice, but it is interesting that they exist to some degree.
Feature Stores
AWS SageMaker Feature Store, Databricks Feature Store, DoorDash Feature Store, Google Feast (Google Blog), Hopsworks, and more.
Additional Patterns from the Literature
There are already multiple books to draw from, esp. “Deep Learning Patterns and Practices” by Andrew Ferlitsch, “Distributed Machine Learning Patterns” by Yuan Tang and “Machine Learning Design Patterns” by Lakshmanan et al. – the following incomplete list mentions patterns already captured in the literature:
- HPO
- Checkpoints (Memento in GoF)
- Ensembles
- Transfer Learning
- Model Versioning
- Hashed Features
- Feature Cross
- Multimodal Input
- Reframing
- Multilabel
- Cascade
- Neutral Class
- Rebalancing
- Useful Overfitting
- Distribution Strategy
- Stateless Serving Function
- Batch Serving
- Continued Model Evaluation
- 2-Phase Predictions
- Keyed Predictions
- Transform
- Repeatable Splitting
- Bridged Schema
- Windowed Inference
- Workflow Pipeline
- Heuristic Benchmarks
- Explainable Predictions
- Fairness Lens
- Early Stopping
To be continued.