It is a branch of artificial intelligence that enables algorithms to uncover hidden patterns within datasets, allowing them to make predictions on new, similar data without explicit programming for each task.

Application

This technology finds applications in diverse fields such as image and speech recognition, natural language processing, recommendation systems, fraud detection, portfolio optimization, and automating tasks.

Types of learning

1) Supervised Learning

It is training a model on a labeled dataset, meaning that each training example is paired with an output label. The model learns to map inputs to the desired output based on this labeled data.

Algorithms Classification: Predicting a category label, such as spam detection in emails.

Regression: Predicting a continuous value, such as house prices.

Common Algorithms Linear Regression (Regression) Logistic Regression (Classification) Decision Trees (Classification) Support Vector Machines (SVM) (Classification) Neural Networks ()

Examples Identifying the zip code from handwritten digits on an envelope Determining whether a tumor is benign based on a medical image Detecting fraudulent activity in credit card transactions

2) Un-supervised Learning

It is training a model on data that does not have labeled responses

Algorithms

Clustering: Grouping similar data points together, such as customer segmentation.

Association: Discovering rules that describe large portions of the data, such as market basket analysis.

Dimensionality Reduction: Reducing the number of random variables under consideration, such as principal component analysis (PCA).

Common Algorithms K-Means (Clustering) Hierarchical Clustering (Clustering) DBSCAN (Density-Based Spatial Clustering of Applications with Noise) (Clustering) Principal Component Analysis (PCA) (Dimensionality Reduction) Independent Component Analysis (ICA) (Dimensionality Reduction)

Examples Identifying topics in a set of blog posts Segmenting customers into groups with similar preferences Detecting abnormal access patterns to a website

Reinforcement Learning It involves training a model to make sequences of decisions by rewarding desired behaviors and punishing undesired ones

Game Playing: Training an AI to play games like chess or Go. Robotics: Training robots to perform tasks, such as walking or grasping objects. Self-driving Cars: Training autonomous vehicles to navigate roads safely.

Common Algorithms Q-Learning Deep Q-Network (DQN) Policy Gradient Methods Actor-Critic Methods

Transformer

Transformers are a type of model architecture designed for processing sequential data, like text, using mechanisms like self-attention. They excel in capturing long-range dependencies and parallel processing. They are a type of model architecture and are not inherently tied to a specific learning paradigm like supervised, unsupervised, or reinforcement learning. Instead, they can be used within any of these paradigms depending on the specific task they are applied to.

Large Language model(LLM)

Large Language Models are AI models trained to handle various natural language processing (NLP) tasks by learning patterns from large datasets of text. They leverage deep learning techniques, often based on transformer architectures, to understand and generate human-like text.

Examples of LLMs GPT-4 (Generative Pre-trained Transformer 4) BERT (Bidirectional Encoder Representations from Transformers) T5 (Text-To-Text Transfer Transformer)

How LLMs Are Built

Data Collection: LLMs are trained on vast and diverse datasets that include books, articles, websites, and other text sources.
Model Architecture: Most modern LLMs use transformer architectures, which rely on attention mechanisms to process and generate text.
Pre-training:

Objective: The model is trained on large amounts of text data using unsupervised learning techniques. Common objectives include predicting the next word in a sentence or filling in missing words. Techniques: Examples include masked language modeling (MLM) for BERT and autoregressive modeling for GPT.

Fine-tuning:

Objective: The pre-trained model is further trained on specific tasks or datasets to improve its performance on particular applications, such as translation or sentiment analysis.

Evaluation:

Metrics: The model’s performance is evaluated using metrics such as accuracy, BLEU score (for translation), and F1 score (for classification).

Evaluation metrics

They are crucial in machine learning and artificial intelligence for assessing the performance of models and algorithms. The choice of metrics depends on the type of task (e.g., classification, regression, clustering)

Classification Metrics:

Accuracy The ratio of correctly predicted observations to the total observations.

Precision The ratio of correctly predicted positive observations to the total predicted positives.

Recall (Sensitivity or True Positive Rate) The ratio of correctly predicted positive observations to the total actual positives

F1 Score The harmonic mean of precision and recall, balancing both metrics.

ROC-AUC (Receiver Operating Characteristic - Area Under Curve)

It measures the model's ability to distinguish between classes. AUC is the area under the ROC curve, which plots true positive rate vs. false positive rate.

Range: 0 to 1 (1 indicates perfect classification).

Image description

Bias

The tendency of a model to consistently make errors in a particular direction

High Bias: Leads to underfitting, where the model is too simple to capture the underlying patterns in the data. Low Bias: Indicates a model that is more flexible and capable of fitting the training data well, but it can still suffer from high variance.

How to measure bias? A benchmark dataset is a standardized dataset used to evaluate and compare the performance of various algorithms or models within a specific domain. They related to bias are specifically designed to help evaluate and understand fairness and bias in machine learning models. Ex. StereoSet (Designed to evaluate and address biases in natural language processing models) CrowS-Pairs (focusing on the perpetuation of stereotypes)

How to remove bias? (Debiasing)

Ex. AutoDebias (Automatically mitigate biases in machine learning models)

Machine Learning Packages

scikit-learn

It is a very popular tool which contains a number of state-of-the-art machine learning algorithms, and the most prominent Python library for machine learning

numpy

It is a fundamental packages for scientific computing in Python. It contains functionality for multidimensional arrays, high-level mathematical functions such as linear algebra operations and the Fourier transform, and pseudo random number generators

matplotlib

It is a primary scientific plotting library in Python. It provides functions for making publication-quality visualizations such as line charts, histograms, scatter plots etc

pandas

pip install numpy scipy matplotlib ipython scikit-learn pandas

First code

# Step 1: Import the necessary libraries
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 2: Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 5: Train a K-Nearest Neighbors classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Step 6: Make predictions on the testing set
y_pred = knn.predict(X_test)

# Step 7: Evaluate the classifier
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Ensemble Model

It is a machine learning technique that combines the predictions of multiple individual models to improve overall performance. The main idea is that by aggregating multiple models, you can achieve better accuracy, robustness, and generalization compared to using any single model alone.

Types of Ensemble Methods

Bagging (Bootstrap Aggregating)

Multiple models (e.g., decision trees) are trained on different bootstrapped subsets of the data. The predictions are aggregated (e.g., by voting for classification or averaging for regression).

It reduces variance and helps in preventing overfitting.

Ex.

Random Forest

Bagged Decision Trees

Boosting

Models are trained sequentially, where each new model tries to correct the errors of the previous ones. The predictions are combined, often with a weighted average.

It reduces bias and improves the accuracy of predictions by focusing on the errors of previous models.

Ex. AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost.

Stacking

Different models (base learners) are trained on the same data, and their predictions are used as inputs to a meta-learner, which makes the final prediction.

Combines multiple models to leverage their individual strengths and improve performance.

Ex Using logistic regression as a meta-learner with decision trees and SVMs as base learners.

Voting

For classification, majority voting is used to choose the class with the most votes. For regression, the average of predictions is taken.

Types: Hard voting (majority class) and soft voting (average predicted probabilities).

Random Forest

It is a versatile and powerful machine learning algorithm that's used for both classification and regression tasks.

The algorithm creates multiple decision trees by sampling the training data with replacement (bootstrap sampling). Each tree is trained on a slightly different dataset, which helps in reducing overfitting.

For classification tasks, the final output is determined by majority voting among the individual trees. For regression tasks, the output is the average of the predictions from all trees.

Classification

Decision Trees Random Forest Naive Bayes K-Nearest Neighbors (KNN) Logistic Regression Neural Networks AdaBoost Gradient Boosting Machines (GBM) Support Vector Machines (SVM) Quadratic Discriminant Analysis (QDA)

Regression

Linear Regression Decision Tree Regression Random Forest Regression Bayesian Regression K-Nearest Neighbors (KNN) Regression Gradient Boosting Machines (GBM) for Regression

Clustering

K-Medoids K-Means Clustering Hierarchical Clustering Gaussian Mixture Models (GMM) Agglomerative Clustering BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) DBSCAN (Density-Based Spatial Clustering of Applications with Noise) OPTICS (Ordering Points To Identify the Clustering Structure) HDBSCAN (Hierarchical DBSCAN)

Association

Apriori Algorithm Eclat Algorithm AIS Algorithm FP-Growth (Frequent Pattern Growth)

Why 255?

In standard grayscale and RGB images, pixel values are often represented as 8-bit integers, which means each pixel value ranges from 0 to 255. This range is derived from the 8-bit depth, where:

0 represents the minimum value (e.g., black in grayscale or full absence of color in RGB). 255 represents the maximum value (e.g., white in grayscale or full intensity of a color in RGB)

Stay Connected! If you enjoyed this post, don’t forget to follow me on social media for more updates and insights:

Twitter: madhavganesan

Instagram: madhavganesan

LinkedIn: madhavganesan

Introduction to Machine Learning

Application

Types of learning

1) Supervised Learning

2) Un-supervised Learning

Transformer

Large Language model(LLM)

Evaluation metrics

Bias

How to remove bias? (Debiasing)

Machine Learning Packages

scikit-learn

numpy

matplotlib

pandas

First code

Ensemble Model

Types of Ensemble Methods

Bagging (Bootstrap Aggregating)

Random Forest

Bagged Decision Trees

Boosting

Stacking

Voting

Random Forest

Classification

Regression

Clustering

Association

Why 255?