Machine Learning with Graph (3): Traditional Method
Chanipong
2024. 1. 18. 19:28
Traditional ML Pipeline
All about designing proper features!! (hand-designed)
Assume: nodes/links/graphs have some types of attributes associated with them, obtain features for all training data
Goal: Train vectorized ML model (Random Foreset, SVM, NN) → Given a new node/link/graph obtaining its features and make a prediction by applying the model
Design Choices: Features (d-dim), Objects: nodes, edges, set of nodes, entire graphs
Objective Function: what task are we aiming to solve?
Feature Design
1. Node-level Feature
Importance-based Features: Useful for predicting influential nodes in a graph (Node degree, Node centrality) Structure-based Features: Useful for predicting a particular role a node plays in a graph (Node degree, Clustering coefficient, Graphlets)
Goal: Characterize the structure and position of a node in the network
Node Degree
Node Centrality
Cluster coefficient
Graphlets
degree $k_v$ of node v is the # of the neighboring nodes; treating all neighboring nodes equally(= cannot distinguish if # is same)
$c_v$ takes the node importance in a into account (Engienvector/Betweenness/Closeness Centrality)
In case 2, except node $v$, total 4 choose 2 = 6 edges, and 3 edges exist.
Generalizing the shape of graph refering to the number of nodes
Types of Node Centrality
Engienvector Centrality
Betweenness Centrality
Clossness Centrality
Graphlets
2. Link-level Feature
Link-level Prediction ① Links missing at random: Remove a random set of links and then aim to predict them ② Links over time: Given $G[t0, t0’]$ a graph defined by edges up to time $t0’$, output a ranked list $L$ of edges (not in $G[t0, t0’]$) that are predicted to appear in time $G[t1, t1’]$ (As time passes, development in connection) ③ Evaluation: $n=|E_{new}|$ : # new edges that actually appear during test period $[t1, t1’]$, take top $n$ elements of $L$ and count correct edges
2-1) Distance-based feature
Method: Simply calculating the distance
Con: can not show neighborhood overlap or strength (Even though having nodes A-B 1 path, and 2 paths have different strengths)
2-2) Local neighborhood overlap
Way: Captures # neiboring nodes shared between two nodes $v_1$ and $v_2$
Con: Might get shared nodes in the future, but because having 0 shared nodes now can cause having no value
2-3) Global neighborhood overlap
Goal: To resolve the limitation by considering the entire graph
Katz index: count the number of walks of all lengths between a given pair of nodes by using powers of the graph adjacency matrix (n-power => each value in the matrix means by # of ways that could get there in n walks)
3. Graph-level Feature
Goal: features that characterize the structure of an entire graph
Kernel Methods to design kernels instead of feature vectors ① Kernel $𝐾(𝐺,𝐺′)∈ ℝ$ measures similarity b/w data ② Kernel matrix $𝑲 = (𝐾(𝐺,𝐺′))_{𝐺,𝐺′}$ (similarities b/w the all datapoints) must always be positive semi-definite (i.e., has positive eigenvalues) ③There exists a feature representation $𝜙(∙)$ such that $𝐾(𝐺, 𝐺’) = 𝜙(G)^T𝜙 (𝐺′)$ ④ Once the kernel is defined, off-the-shelf ML model, such as kernel SVM, can be used to make predictions.
Graph Kernel
Goal: Define graph feature vector 𝜙(G)
Key Idea: Bag-of-Words(BoW) for a graph, but BoW simply uses the word counts as features for the document (no ordering considered; having 4 node = considered as same even having different link)
3-1) Graphlet Kernel
Key Idea: Count the number of different graphlets in a graph.
Note: 위에서 나온 node-level Graphlet이랑 개념이 different; Here, no need to be connected, and not rooted
Limitation: Counting graphlets is very expensive!
Counting size-𝑘 graphlets for a graph with size-𝑛 by enumeration takes $𝑛^𝑘$
This is unavoidable in the worst-case since subgraph isomorphism test (judging whether a graph is a subgraph of another graph) is NP-hard
If a graph’s node degree is bounded by $𝑑$, an $𝑂(𝑛𝑑^{𝑘−1})$ algorithm exists to count all the graphlets of size $𝑘$.
3-2) Weisfeiler-Lehman (WL) Kernel
Goal: Design an efficient graph feature descriptor $𝜙(G)$
Key Idea: Use neighborhood structure to iteratively enrich node vocabulary. (Bag of node degrees는 1-hop neighborhood info만 담고 있다는 것에서 일반화된 multi-hop) ⇒ Color Refinement Algorithm
Color Refinement (Bag-of-Colors) Given: A graph $G$ with a set of nodes $V$ ① Assign an initial color $c^{(0)}(v)$ to each node v ② Iteratively refine node colors by $c^{(k+1)}(v) = HASH(c^{(k)}(v), {c^{(k)}_{u∈N(v)}})$, where HASH maps different inputs to different colors ③ After K steps of color refinement, c^{(K)}(v) summarizes the structure of K-hop neighborhood
WL Kernel computationally efficient! (time complexity is linear in #(edges))
⇒ When computing a kernel value, only colors appeared in the two graphs need to be tracked. (max # of color = total # of the nodes)