Machine Learning with Graph (3): Traditional Method

Chanipong 2024. 1. 18. 19:28

Traditional ML Pipeline

All about designing proper features!! (hand-designed)

Assume: nodes/links/graphs have some types of attributes associated with them, obtain features for all training data

Goal: Train vectorized ML model (Random Foreset, SVM, NN) → Given a new node/link/graph obtaining its features and make a prediction by applying the model

Design Choices: Features (d-dim), Objects: nodes, edges, set of nodes, entire graphs

Objective Function: what task are we aiming to solve?

Feature Design

1. Node-level Feature

Importance-based Features: Useful for predicting influential nodes in a graph (Node degree, Node centrality)
Structure-based Features: Useful for predicting a particular role a node plays in a graph (Node degree, Clustering coefficient, Graphlets)

Goal: Characterize the structure and position of a node in the network

Node Degree	Node Centrality	Cluster coefficient	Graphlets
degree $k_v$ of node v is the # of the neighboring nodes; treating all neighboring nodes equally(= cannot distinguish if # is same)	$c_v$ takes the node importance in a into account (Engienvector/Betweenness/Closeness Centrality)	In case 2, except node $v$, total 4 choose 2 = 6 edges, and 3 edges exist.	Generalizing the shape of graph refering to the number of nodes

Types of Node Centrality

Engienvector Centrality
Betweenness Centrality
Clossness Centrality

Graphlets

2. Link-level Feature

Link-level Prediction
① Links missing at random: Remove a random set of links and then aim to predict them
② Links over time: Given $G[t0, t0’]$ a graph defined by edges up to time $t0’$, output a ranked list $L$ of edges (not in $G[t0, t0’]$) that are predicted to appear in time $G[t1, t1’]$ (As time passes, development in connection)
③ Evaluation: $n=|E_{new}|$ : # new edges that actually appear during test period $[t1, t1’]$, take top $n$ elements of $L$ and count correct edges

2-1) Distance-based feature

Method: Simply calculating the distance

Con: can not show neighborhood overlap or strength (Even though having nodes A-B 1 path, and 2 paths have different strengths)

2-2) Local neighborhood overlap

Way: Captures # neiboring nodes shared between two nodes $v_1$ and $v_2$

Con: Might get shared nodes in the future, but because having 0 shared nodes now can cause having no value

2-3) Global neighborhood overlap

Goal: To resolve the limitation by considering the entire graph

Katz index: count the number of walks of all lengths between a given pair of nodes by using powers of the graph adjacency matrix (n-power => each value in the matrix means by # of ways that could get there in n walks)

3. Graph-level Feature

Goal: features that characterize the structure of an entire graph

Kernel Methods
to design kernels instead of feature vectors
① Kernel $𝐾(𝐺,𝐺′)∈ ℝ$ measures similarity b/w data
② Kernel matrix $𝑲 = (𝐾(𝐺,𝐺′))_{𝐺,𝐺′}$ (similarities b/w the all datapoints) must always be positive semi-definite (i.e., has positive eigenvalues)
③There exists a feature representation $𝜙(∙)$ such that $𝐾(𝐺, 𝐺’) = 𝜙(G)^T𝜙 (𝐺′)$
④ Once the kernel is defined, off-the-shelf ML model, such as kernel SVM, can be used to make predictions.

Graph Kernel

Goal: Define graph feature vector 𝜙(G)

Key Idea: Bag-of-Words(BoW) for a graph, but BoW simply uses the word counts as features for the document (no ordering considered; having 4 node = considered as same even having different link)

3-1) Graphlet Kernel

Key Idea: Count the number of different graphlets in a graph.

Note: 위에서 나온 node-level Graphlet이랑 개념이 different; Here, no need to be connected, and not rooted

Limitation: Counting graphlets is very expensive!

Counting size-𝑘 graphlets for a graph with size-𝑛 by enumeration takes $𝑛^𝑘$
This is unavoidable in the worst-case since subgraph isomorphism test (judging whether a graph is a subgraph of another graph) is NP-hard
If a graph’s node degree is bounded by $𝑑$, an $𝑂(𝑛𝑑^{𝑘−1})$ algorithm exists to count all the graphlets of size $𝑘$.

3-2) Weisfeiler-Lehman (WL) Kernel

Goal: Design an efficient graph feature descriptor $𝜙(G)$

Key Idea: Use neighborhood structure to iteratively enrich node vocabulary. (Bag of node degrees는 1-hop neighborhood info만 담고 있다는 것에서 일반화된 multi-hop) ⇒ Color Refinement Algorithm

Color Refinement (Bag-of-Colors)
Given: A graph $G$ with a set of nodes $V$
① Assign an initial color $c^{(0)}(v)$ to each node v
② Iteratively refine node colors by
$c^{(k+1)}(v) = HASH(c^{(k)}(v), {c^{(k)}_{u∈N(v)}})$, where HASH maps different inputs to different colors
③ After K steps of color refinement, c^{(K)}(v) summarizes the structure of K-hop neighborhood

WL Kernel computationally efficient! (time complexity is linear in #(edges))

⇒ When computing a kernel value, only colors appeared in the two graphs need to be tracked. (max # of color = total # of the nodes)

3-3) Other kernels

Random-walk kernel, Shortest-path graph kernel · · ·

Summary

Traditional ML Pipeline

Hand-crafted feature + ML model

Hand-crafted features for graph data

① Node-level: Node degree, centrality, clustering coefficient, graphlets

② Link-level: Distance-based feature, local/global neighborhood overlap

③ Graph-level: Graphlet kernel, WL kernel

Organizing lecture from Stanford CS224W: Machine Learning with Graph (Prof. Jure Leskovec, 2021)
Image Reference: https://snap.stanford.edu/class/cs224w-2021/