Nonnegative Matrix Factorization (NMF) for Clustering#
Auther: Arash Ghazi
Contact: arash.ghazi.2012@gmail.com
Nonnegative Matrix Factorization (NMF) for Clustering#
Nonnegative Matrix Factorization (NMF) is a powerful technique used in data analysis and machine learning, particularly for clustering tasks. It is a matrix factorization method that decomposes a nonnegative matrix into two lower-dimensional nonnegative matrices. This decomposition helps in uncovering hidden patterns and structures within the data, making it an effective tool for clustering.
Understanding NMF#
NMF aims to approximate a given nonnegative matrix ( V ) (of size ( m \times n )) as the product of two nonnegative matrices: ( W ) (of size ( m \times r )) and ( H ) (of size ( r \times n )). Mathematically, this can be represented as:
[ V \approx W \times H ]
Here, ( r ) is the rank of the factorization, which is typically much smaller than ( m ) and ( n ). The matrices ( W ) and ( H ) are learned such that their product closely approximates the original matrix ( V ).
Applications of NMF in Clustering#
Document Clustering: NMF is widely used in text mining for document clustering. By decomposing the term-document matrix, NMF can identify latent topics and group similar documents together based on these topics.
Image Clustering: In image processing, NMF can be applied to cluster images by decomposing the pixel intensity matrix. This helps in identifying common features and patterns across different images.
Gene Expression Data: In bioinformatics, NMF is used to analyze gene expression data. By clustering genes with similar expression patterns, researchers can identify potential gene functions and regulatory mechanisms.
Advantages of NMF#
Interpretability: One of the key advantages of NMF is its interpretability. The nonnegativity constraint ensures that the resulting matrices ( W ) and ( H ) have a clear and meaningful interpretation, making it easier to understand the underlying patterns.
Sparsity: NMF often leads to sparse representations, which can be beneficial for clustering tasks. Sparse representations highlight the most important features, reducing noise and improving clustering performance.
Scalability: NMF can be efficiently applied to large datasets, making it suitable for real-world applications with high-dimensional data.
Challenges and Considerations#
Initialization: The performance of NMF can be sensitive to the initialization of the matrices ( W ) and ( H ). Proper initialization techniques, such as random initialization or using the results of other clustering methods, can help improve the results.
Rank Selection: Choosing the appropriate rank ( r ) is crucial for the success of NMF. Cross-validation and other model selection techniques can be used to determine the optimal rank.
Computational Complexity: While NMF is scalable, it can still be computationally intensive for very large datasets. Efficient algorithms and parallel computing techniques can help mitigate this issue.
Example: Document Clustering with NMF#
Imagine we have a collection of 10 documents, and we want to cluster them based on their content. Each document can be represented as a vector of word frequencies, resulting in a term-document matrix ( V ). For simplicity, let’s assume our term-document matrix ( V ) is as follows:
Terms/Docs |
Doc1 |
Doc2 |
Doc3 |
Doc4 |
Doc5 |
Doc6 |
Doc7 |
Doc8 |
Doc9 |
Doc10 |
---|---|---|---|---|---|---|---|---|---|---|
Term1 |
1 |
0 |
2 |
1 |
0 |
0 |
1 |
0 |
1 |
0 |
Term2 |
0 |
1 |
0 |
1 |
2 |
1 |
0 |
1 |
0 |
2 |
Term3 |
1 |
1 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
Term4 |
0 |
2 |
0 |
1 |
1 |
0 |
0 |
2 |
0 |
1 |
Term5 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
Our goal is to factorize this matrix ( V ) into two nonnegative matrices ( W ) and ( H ) such that ( V \approx W \times H ). Let’s assume we choose the rank ( r = 2 ), meaning we want to cluster the documents into 2 clusters.
After applying NMF, we obtain the following matrices ( W ) and ( H ):
Matrix ( W ) (Term-Cluster Matrix):
Terms/Clusters |
Cluster1 |
Cluster2 |
---|---|---|
Term1 |
0.8 |
0.2 |
Term2 |
0.1 |
0.9 |
Term3 |
0.6 |
0.4 |
Term4 |
0.2 |
0.8 |
Term5 |
0.7 |
0.3 |
Matrix ( H ) (Cluster-Document Matrix):
Clusters/Docs |
Doc1 |
Doc2 |
Doc3 |
Doc4 |
Doc5 |
Doc6 |
Doc7 |
Doc8 |
Doc9 |
Doc10 |
---|---|---|---|---|---|---|---|---|---|---|
Cluster1 |
0.9 |
0.1 |
0.8 |
0.3 |
0.2 |
0.7 |
0.8 |
0.1 |
0.9 |
0.2 |
Cluster2 |
0.1 |
0.9 |
0.2 |
0.7 |
0.8 |
0.3 |
0.2 |
0.9 |
0.1 |
0.8 |
From the matrix ( W ), we can see the association of each term with the clusters. For example, Term1 is strongly associated with Cluster1, while Term2 is strongly associated with Cluster2.
From the matrix ( H ), we can see the association of each document with the clusters. For example, Doc1, Doc3, Doc7, and Doc9 are strongly associated with Cluster1, while Doc2, Doc4, Doc5, Doc8, and Doc10 are strongly associated with Cluster2.
By examining these matrices, we can cluster the documents based on their content. Documents with similar content will be grouped together in the same cluster.
This example demonstrates how NMF can be used for clustering tasks, providing interpretable and meaningful results.
import numpy as np
from sklearn.decomposition import NMF
from sklearn.preprocessing import normalize
# Sample term-document matrix (V)
V = np.array([
[1, 0, 2, 1, 0, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 2, 1, 0, 1, 0, 2],
[1, 1, 1, 0, 0, 1, 1, 1, 1, 0],
[0, 2, 0, 1, 1, 0, 0, 2, 0, 1],
[1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
])
# Number of clusters
n_clusters = 2
# Apply NMF
nmf_model = NMF(n_components=n_clusters, init='random', random_state=42)
W = nmf_model.fit_transform(V)
H = nmf_model.components_
# Normalize the matrices for better interpretability
W = normalize(W, norm='l1', axis=1)
H = normalize(H, norm='l1', axis=0)
# Display the results
print("Term-Cluster Matrix (W):")
print(W)
print("\nCluster-Document Matrix (H):")
print(H)
# Assign documents to clusters based on the highest value in H
document_clusters = np.argmax(H, axis=0)
print("\nDocument Clusters:")
print(document_clusters)
Term-Cluster Matrix (W):
[[1. 0. ]
[0.00693408 0.99306592]
[0.80131066 0.19868934]
[0. 1. ]
[1. 0. ]]
Cluster-Document Matrix (H):
[[0.99522321 0.10597333 1. 0.23537329 0. 0.50104913
0.99522321 0.10597333 0.99522321 0. ]
[0.00477679 0.89402667 0. 0.76462671 1. 0.49895087
0.00477679 0.89402667 0.00477679 1. ]]
Document Clusters:
[0 1 0 1 1 0 0 1 0 1]
Sure, here’s a Python example demonstrating how to use Nonnegative Matrix Factorization (NMF) for document clustering using the sklearn
library:
import numpy as np
from sklearn.decomposition import NMF
from sklearn.preprocessing import normalize
# Sample term-document matrix (V)
V = np.array([
[1, 0, 2, 1, 0, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 2, 1, 0, 1, 0, 2],
[1, 1, 1, 0, 0, 1, 1, 1, 1, 0],
[0, 2, 0, 1, 1, 0, 0, 2, 0, 1],
[1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
])
# Number of clusters
n_clusters = 2
# Apply NMF
nmf_model = NMF(n_components=n_clusters, init='random', random_state=42)
W = nmf_model.fit_transform(V)
H = nmf_model.components_
# Normalize the matrices for better interpretability
W = normalize(W, norm='l1', axis=1)
H = normalize(H, norm='l1', axis=0)
# Display the results
print("Term-Cluster Matrix (W):")
print(W)
print("\nCluster-Document Matrix (H):")
print(H)
# Assign documents to clusters based on the highest value in H
document_clusters = np.argmax(H, axis=0)
print("\nDocument Clusters:")
print(document_clusters)
Explanation:#
Term-Document Matrix (V): This matrix represents the frequency of terms in each document.
Number of Clusters (n_clusters): We set the number of clusters to 2.
Apply NMF: We use the
NMF
class fromsklearn
to perform the factorization. Thefit_transform
method decomposes the matrix ( V ) into two matrices ( W ) and ( H ).Normalize the Matrices: We normalize the matrices ( W ) and ( H ) for better interpretability.
Display the Results: We print the term-cluster matrix ( W ) and the cluster-document matrix ( H ).
Assign Documents to Clusters: We assign each document to a cluster based on the highest value in the cluster-document matrix ( H ).
This code provides a basic example of how to use NMF for document clustering. You can adjust the number of clusters and the term-document matrix to fit your specific use case.
Refrences#
Here are some references that provide more information on Nonnegative Matrix Factorization (NMF):