Nonnegative Matrix Factorization (NMF) for Clustering#

Auther: Arash Ghazi

Contact: arash.ghazi.2012@gmail.com

Nonnegative Matrix Factorization (NMF) for Clustering#

Nonnegative Matrix Factorization (NMF) is a powerful technique used in data analysis and machine learning, particularly for clustering tasks. It is a matrix factorization method that decomposes a nonnegative matrix into two lower-dimensional nonnegative matrices. This decomposition helps in uncovering hidden patterns and structures within the data, making it an effective tool for clustering.

Understanding NMF#

NMF aims to approximate a given nonnegative matrix ( V ) (of size ( m \times n )) as the product of two nonnegative matrices: ( W ) (of size ( m \times r )) and ( H ) (of size ( r \times n )). Mathematically, this can be represented as:

[ V \approx W \times H ]

Here, ( r ) is the rank of the factorization, which is typically much smaller than ( m ) and ( n ). The matrices ( W ) and ( H ) are learned such that their product closely approximates the original matrix ( V ).

Applications of NMF in Clustering#

  1. Document Clustering: NMF is widely used in text mining for document clustering. By decomposing the term-document matrix, NMF can identify latent topics and group similar documents together based on these topics.

  2. Image Clustering: In image processing, NMF can be applied to cluster images by decomposing the pixel intensity matrix. This helps in identifying common features and patterns across different images.

  3. Gene Expression Data: In bioinformatics, NMF is used to analyze gene expression data. By clustering genes with similar expression patterns, researchers can identify potential gene functions and regulatory mechanisms.

Advantages of NMF#

  • Interpretability: One of the key advantages of NMF is its interpretability. The nonnegativity constraint ensures that the resulting matrices ( W ) and ( H ) have a clear and meaningful interpretation, making it easier to understand the underlying patterns.

  • Sparsity: NMF often leads to sparse representations, which can be beneficial for clustering tasks. Sparse representations highlight the most important features, reducing noise and improving clustering performance.

  • Scalability: NMF can be efficiently applied to large datasets, making it suitable for real-world applications with high-dimensional data.

Challenges and Considerations#

  • Initialization: The performance of NMF can be sensitive to the initialization of the matrices ( W ) and ( H ). Proper initialization techniques, such as random initialization or using the results of other clustering methods, can help improve the results.

  • Rank Selection: Choosing the appropriate rank ( r ) is crucial for the success of NMF. Cross-validation and other model selection techniques can be used to determine the optimal rank.

  • Computational Complexity: While NMF is scalable, it can still be computationally intensive for very large datasets. Efficient algorithms and parallel computing techniques can help mitigate this issue.

Example: Document Clustering with NMF#

Imagine we have a collection of 10 documents, and we want to cluster them based on their content. Each document can be represented as a vector of word frequencies, resulting in a term-document matrix ( V ). For simplicity, let’s assume our term-document matrix ( V ) is as follows:

Terms/Docs

Doc1

Doc2

Doc3

Doc4

Doc5

Doc6

Doc7

Doc8

Doc9

Doc10

Term1

1

0

2

1

0

0

1

0

1

0

Term2

0

1

0

1

2

1

0

1

0

2

Term3

1

1

1

0

0

1

1

1

1

0

Term4

0

2

0

1

1

0

0

2

0

1

Term5

1

0

1

0

0

1

1

0

1

0

Our goal is to factorize this matrix ( V ) into two nonnegative matrices ( W ) and ( H ) such that ( V \approx W \times H ). Let’s assume we choose the rank ( r = 2 ), meaning we want to cluster the documents into 2 clusters.

After applying NMF, we obtain the following matrices ( W ) and ( H ):

Matrix ( W ) (Term-Cluster Matrix):

Terms/Clusters

Cluster1

Cluster2

Term1

0.8

0.2

Term2

0.1

0.9

Term3

0.6

0.4

Term4

0.2

0.8

Term5

0.7

0.3

Matrix ( H ) (Cluster-Document Matrix):

Clusters/Docs

Doc1

Doc2

Doc3

Doc4

Doc5

Doc6

Doc7

Doc8

Doc9

Doc10

Cluster1

0.9

0.1

0.8

0.3

0.2

0.7

0.8

0.1

0.9

0.2

Cluster2

0.1

0.9

0.2

0.7

0.8

0.3

0.2

0.9

0.1

0.8

From the matrix ( W ), we can see the association of each term with the clusters. For example, Term1 is strongly associated with Cluster1, while Term2 is strongly associated with Cluster2.

From the matrix ( H ), we can see the association of each document with the clusters. For example, Doc1, Doc3, Doc7, and Doc9 are strongly associated with Cluster1, while Doc2, Doc4, Doc5, Doc8, and Doc10 are strongly associated with Cluster2.

By examining these matrices, we can cluster the documents based on their content. Documents with similar content will be grouped together in the same cluster.

This example demonstrates how NMF can be used for clustering tasks, providing interpretable and meaningful results.

import numpy as np
from sklearn.decomposition import NMF
from sklearn.preprocessing import normalize

# Sample term-document matrix (V)
V = np.array([
    [1, 0, 2, 1, 0, 0, 1, 0, 1, 0],
    [0, 1, 0, 1, 2, 1, 0, 1, 0, 2],
    [1, 1, 1, 0, 0, 1, 1, 1, 1, 0],
    [0, 2, 0, 1, 1, 0, 0, 2, 0, 1],
    [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
])

# Number of clusters
n_clusters = 2

# Apply NMF
nmf_model = NMF(n_components=n_clusters, init='random', random_state=42)
W = nmf_model.fit_transform(V)
H = nmf_model.components_

# Normalize the matrices for better interpretability
W = normalize(W, norm='l1', axis=1)
H = normalize(H, norm='l1', axis=0)

# Display the results
print("Term-Cluster Matrix (W):")
print(W)
print("\nCluster-Document Matrix (H):")
print(H)

# Assign documents to clusters based on the highest value in H
document_clusters = np.argmax(H, axis=0)
print("\nDocument Clusters:")
print(document_clusters)
Term-Cluster Matrix (W):
[[1.         0.        ]
 [0.00693408 0.99306592]
 [0.80131066 0.19868934]
 [0.         1.        ]
 [1.         0.        ]]

Cluster-Document Matrix (H):
[[0.99522321 0.10597333 1.         0.23537329 0.         0.50104913
  0.99522321 0.10597333 0.99522321 0.        ]
 [0.00477679 0.89402667 0.         0.76462671 1.         0.49895087
  0.00477679 0.89402667 0.00477679 1.        ]]

Document Clusters:
[0 1 0 1 1 0 0 1 0 1]

Sure, here’s a Python example demonstrating how to use Nonnegative Matrix Factorization (NMF) for document clustering using the sklearn library:

import numpy as np
from sklearn.decomposition import NMF
from sklearn.preprocessing import normalize

# Sample term-document matrix (V)
V = np.array([
    [1, 0, 2, 1, 0, 0, 1, 0, 1, 0],
    [0, 1, 0, 1, 2, 1, 0, 1, 0, 2],
    [1, 1, 1, 0, 0, 1, 1, 1, 1, 0],
    [0, 2, 0, 1, 1, 0, 0, 2, 0, 1],
    [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
])

# Number of clusters
n_clusters = 2

# Apply NMF
nmf_model = NMF(n_components=n_clusters, init='random', random_state=42)
W = nmf_model.fit_transform(V)
H = nmf_model.components_

# Normalize the matrices for better interpretability
W = normalize(W, norm='l1', axis=1)
H = normalize(H, norm='l1', axis=0)

# Display the results
print("Term-Cluster Matrix (W):")
print(W)
print("\nCluster-Document Matrix (H):")
print(H)

# Assign documents to clusters based on the highest value in H
document_clusters = np.argmax(H, axis=0)
print("\nDocument Clusters:")
print(document_clusters)

Explanation:#

  1. Term-Document Matrix (V): This matrix represents the frequency of terms in each document.

  2. Number of Clusters (n_clusters): We set the number of clusters to 2.

  3. Apply NMF: We use the NMF class from sklearn to perform the factorization. The fit_transform method decomposes the matrix ( V ) into two matrices ( W ) and ( H ).

  4. Normalize the Matrices: We normalize the matrices ( W ) and ( H ) for better interpretability.

  5. Display the Results: We print the term-cluster matrix ( W ) and the cluster-document matrix ( H ).

  6. Assign Documents to Clusters: We assign each document to a cluster based on the highest value in the cluster-document matrix ( H ).

This code provides a basic example of how to use NMF for document clustering. You can adjust the number of clusters and the term-document matrix to fit your specific use case.

Refrences#

Here are some references that provide more information on Nonnegative Matrix Factorization (NMF):

  1. Non-negative matrix factorization - Wikipedia

  2. Understanding Non-Negative Matrix Factorization (NMF)