Sep 07

My blogpost on K-means clustering has the highest number of views, so people are probably interested in it. Sadly enough I lost the source code of the K-means action a while ago. Last week I needed an external harddisk to make a back-up of some files. There was already some content on the disk. I found quite some pieces of code including the K-means code. Although it is quite simple code operating on (if I remember correctly 8-bit) greyscale images, it might give some insights in how to do this.

The whole code file is presented below. For more information you can view my earlier blogpost on K-means clustering.

package actions;

import java.awt.Color;
import java.awt.image.BufferedImage;
import java.util.ArrayList;

/**
 * This KMeansAction performs a K-means clustering action on a BufferedImage
 * @author Patrick van Kouteren
 *
 */

public class KMeansAction {

		BufferedImage image_temp;
		boolean not_terminated;
		int loops, changedPixels;
		int[] histogram;
		ArrayList classes;
		int [] lowerbounds;
		public final static int MEAN_BY_MOD = 1;
		public final static int MEAN_BY_SPACE = 2;
		public final static int MEAN_AT_RANDOM = 3;

		/**
		 * Controls the actual work:
		 * - Initialization
		 * - Loop until termination condition is met
		 *  + for each pixel: assign pixel to a class such that the distance from the pixel to the mean of that class is minimized
		 *  + for each class: recalculate the means of the class based on pixels belonging to that class
		 * - End loop
		 * @param image
		 * @param bins (k)
		 * @param histogram
		 */
		public KMeansAction(BufferedImage image, int bins, int[]histogram, int initway) {
			this.histogram = histogram;
			lowerbounds = new int[bins];
			initialize(image, bins, initway);
			calculateBounds();
			while (not_terminated) {
				recalculateMeans();
				loops++;
				checkTermination();
				}
			processImage(image, bins);
		}

		/**
		 * Set the new color values for the image
		 * @param image
		 */
		private void processImage(BufferedImage image, int bins) {
			int delta = 255 / (bins-1);
			for (int h = 0; h < image.getHeight(); h++){
				for (int w = 0; w < image.getWidth(); w++){
					Color rgb = new Color(image.getRGB(w, h));
					int grey = rgb.getRed();
					for (int i = 0; i classes.get(i).lowerbound && grey < classes.get(i).upperbound) {
							int g = i*delta;
							image_temp.setRGB(w,h,(new Color(g, g, g)).getRGB());
						}
					}
				}
			}
		}

		/**
		 * Returns the image created by the processImage method
		 * @return the result image
		 */
		public BufferedImage getResultImage() {
			return image_temp;
		}

		/**
		 * Just for fun: returns the number of loops which were needed for getting a stable result
		 * @return number of loops for stable result
		 */
		public int getLoops(){
			return loops;
		}

		/**
		 * Initializes the algorithm. Creates k ClusterClasses and puts them into a LinkedList
		 * @param image
		 * @param bins
		 */
		@SuppressWarnings("unchecked")
		private void initialize(BufferedImage image, int bins, int initway){
			image_temp = image;
			loops = 0;
			changedPixels = 0;
			not_terminated = true;
			classes = new ArrayList();
			for (int i = 0; i < bins; i++) {
				ClusterClass cc = new ClusterClass(createMean(initway, bins, i, image));
				classes.add(cc);
			}

		}

		/**
		 * Controls the calculations of the upper- and lowerbounds of ClusterClasses and sets them
		 *
		 */
		private void calculateBounds() {
			for (int i = 0; i < classes.size(); i++){
				int lb = calculateLowerBound(classes.get(i));
				lowerbounds[i] = lb;
				classes.get(i).setBounds(lb,calculateUpperBound(classes.get(i)) );
				}
		}

		/**
		 * Does the actual calculation of the lowerbound
		 * @param ClusterClass
		 * @return Lowerbound
		 */
		private int calculateLowerBound(ClusterClass cc) {
			int cMean = cc.getMean();
			int currentBound = 0;
			for (int i = 0; i< classes.size(); i++) { 					if (cMean > classes.get(i).getMean()) {
						currentBound = Math.max((cMean + classes.get(i).getMean())/2, currentBound);
					}
					else {
					}
				}
			return currentBound;
			}

		/**
		 * Does the actual calculation of the upperbound
		 * @param ClusterClass
		 * @return Upperbound
		 */
		private int calculateUpperBound(ClusterClass cc) {
				int cMean = cc.getMean();
				int currentBound = 255;
				for (int i = 0; i< classes.size(); i++) {
						if (cMean < classes.get(i).getMean()) {
							currentBound = Math.min((cMean + classes.get(i).getMean())/2, currentBound);
						}
						else {}
					}
				return currentBound;
				}

		/**
		 * Takes care of the recalculation of the means of the ClusterClasses
		 *
		 */
		private void recalculateMeans() {
			for (int i = 0; i= 50) {
				not_terminated = false;
			}
			if (changedPixels <= 300) {
				not_terminated = false;
			}
		}

		private void calculateChangedPixels() {
			int changed = 0;
			for (int i = 0; i< lowerbounds[i]) {
					for (int j = c; j lowerbounds[i]) {
					for (int j = lowerbounds[i]; j< image.getHeight(); h++){
					for (int w = 0; w < image.getWidth(); w++){
						pixelindex+=1;
						if (pixelindex % bins == index) {
							Color rgb = new Color(image.getRGB(w, h));
							sum+= rgb.getRed();
							value+=1;
						}
					}}
				return sum/value;

			case MEAN_BY_SPACE:
				return (int)(255 / (bins-1) * index);
			case MEAN_AT_RANDOM:
				Double dmean = Math.random() * 255;
				return (int) Math.floor(dmean);
			default:
				return 0;
			}
		}
}</pre>
In addition to this, the custom class ClusterClass is defined as:
<pre lang="java">package actions;

/**
 * The ClusterClass is just a class holding the important cluster properties.
 * @author Patrick van Kouteren
 *
 */

public class ClusterClass {
	int mean, upperbound, lowerbound;

	public ClusterClass(int m) {
		mean = m;
	}

	public void setBounds(int lb, int ub) {
		lowerbound = lb;
		upperbound = ub;
	}

	public void setMean(int i) {
		mean = i;
	}

	public int getMean() {
		return mean;
	}

	public int getLowerBound() {
		return lowerbound;
	}

	public int getUpperBound() {
		return upperbound;
	}

	public void calculateMean(int [] histogram) {
		int tempMean = 0;
		int counter = 0;
		for (int i = lowerbound; i&lt;= upperbound; i++) {
			counter += histogram[i];
			tempMean += histogram[i] * i;
		}
		mean = tempMean / counter;
	}

}

The source code might not be completely visible. It can be viewed in a blank screen here. As mentioned in the replies to this post, I forgot to add the ClusterClass. It can be viewed in a blank screen here.

69 Responses to “K-means clustering in Java code found!”

  1. Антоха says:

    Данный пост очень информативен, спасибо Вам!

    Edit: please in English as other people might benefit from possible questions and / or solutions.
    Translated: This post is very informative, thank you!

  2. pinaka says:

    in your programm you have used ClusterClass.what is it?
    when i compile the code it’s giving error

    cannot find symbol:ClusterClass

    what should i do?

  3. Patrick says:

    Hi pinaka, thanks for your interest. I’ll look into that as soon as possible. Unfortunately I’ve left my notebook with the code at school. I’ll reply as soon as I’ve solved the issue.

    Cheers,
    Patrick

  4. Patrick says:

    I see what the issue is: it’s a custom class. I’ll add it tomorrow :)

  5. Ivan says:

    Hi Patrick,

    really appreciate your k-means clustering in Java because i am planning to implement it into j2me.

    hope you could add the custom class (ClusterClass) soon so i could try it out on the desktop first.

    Thanks.

  6. Patrick says:

    I’ve updated the post with a link to the file as well as the contents in the post itself. Note that these classes are actually part of an application which I was writing, so they might have been written in the context of this application.

    I would like to continue this application and make it public here some day, but I don’t have the time to do that just now.

    If there are any questions, sugesstions, or whatsoever I’d be glad to hear them!

  7. Prashanth says:

    Is it possible for you to give me an equivalent C# code

  8. Patrick says:

    Prashanth,

    If I would know how to do it I would. Unfortunately I haven’t had time to learn any version of C yet. However, I guess that the code itself gives a reasonable impression of how to do it in C#.

    Cheers,

    Patrick

  9. vinod says:

    hello patrick,

    Do u have CURE & CHAMELON clustering algorithm implemented in java

  10. Patrick says:

    Hi Vinod,

    I’m afraid not. I must admit that I haven’t looked into the code recently. If I’d only have some more time I would. Do you happen to have a clear data source about Cure & Chamelon clustering from which an implementation can be derived?

  11. dekompi says:

    Very nice progie.
    But where the main method sir?

    Regards

  12. Patrick says:

    dekompi: As this action is part of an application, the main method is not present in this class. It’s an action class which receives its info from the main frame. The main frame is basically a frame which shows images and adds file options.

  13. Raghotham S says:

    Hi Sir,
    I hav a project in college – Image segmentation using kmeans

    i hav the code which u hav posted
    but am not knowin how to pass an image as input
    I need to design an applet which takes image as input and outputs the result.
    pls help its v v urgent.
    Thanks in advance

  14. Raghotham S says:

    can u pl giv me a code which can segment image without that histogram input?
    Or else can u explain how to generate histogram for an image?

  15. Patrick says:

    Raghotham S:

    The segmentation is done on the histogram. The histogram of an image is basically a ‘graph’ of all pixel values of the image. In greyscale images the red value is equal to the green and the blue value. When having an image with 256 colors, your histogram contains of 256 bins. You simply count the pixels having a red (or green or blue) value of 0, 1, 2, …, 255. I will try and look up the code for the application tonight.

  16. Raghotham S says:

    Ok thanks a lot!!

  17. Patrick says:

    Raghotham S:

    Full source code can now be found here or at the download page

    Good luck!

  18. [...] there was some interest in source code in two earlier posts (post 1, post 2) I've posted the source code [...]

  19. Raghotham S says:

    Thanks a ton… i owe u :)

  20. Raghotham S says:

    one last doubt patrick,
    the ispe.java doesn’t work for most of the images whereas the ispe.jar file works well with almost all images. Why is it so?

  21. Raghotham S says:

    figured it out :D
    thanks again for the code :)

  22. Patrick says:

    Did it have to do with the classpath and the ImageIO jar file in the lib directory?
    That’s what I expected..

  23. Raghotham S says:

    it needed a slight modification for the buffered image handling part. :)

  24. Chit Su Hlaing says:

    Thank you for your open source coeds. Thank you so much! I wish you that you can have the more great logic and think for your great life.

  25. shipra says:

    hi! cud u pls tell the logic/concept dat u have used in k mean clustering…(i cud’nt get it 100%..).i tried your code ..it gives very good results..generally its said that k mean gives good result ..depending on the initial seed pixels..but howcome your code gives good result (same) for all three options..K,distributed over color space..,randomly
    …cud u pls elaborate distributed over color space and k..(the theorey part)..
    neways very good prog..
    thanks..

  26. Patrick says:

    Shipra,

    I don’t think the result is the same for all three types of initialization. That would be a coincidence. What K-means actually tries to do is to find clusters of likely ‘equal’ pixel values. It all depends on the original image histogram. Especially when the mean is distributed over the color space.
    I can’t imagine my code always gives good results for randomly chosen means. In the worst case all means chosen are close to each other and at one side of the histogram.

    Come to think of it: the means could also be chosen in a weighted way. This would perhaps give some good results in all cases.

  27. Gayathri says:

    I saw the codings provided by you for the project , “K-Means Clustering by java source codes”. Its very nice. I need one help from you. Now i am doing one project in the area of data mining.In my project K-means clustering algorithm is one of the module. So i need some codings for k-means algorithm , in that the clustering technique is used for clustering large amount of data sets . How can i do it in C language.If you have some idea about it means , please help me,… what i need is “implementation of k-means algorithm for clustering large amount of data points using C”.

    “Awaiting for your reply”

    Regards
    Gayathri

  28. shipra says:

    thank u so much!!..for rplying..

  29. Ramesh says:

    can you help me in find web user clusters using k-means on web log file

  30. Patrick says:

    Hi Ramesh,

    Wat is it exactly that you want to cluster? And what are the criteria which determine which piece of data will get into a particular cluster?

    Cheers,

    Patrick

  31. gara says:

    how to run the code ….m getting error whlinterprting

  32. dharma says:

    Hi Patrick sir ..

    now am doing clustering project ..so i need to implement that kmeans algorithms ,,.i have kmeans algorithm which u have posted..but its giving error..Exception in Thread “main”.java.lang.NoClassDefFoundError..so what i can do now,,,
    mar 10 th is my project submission ..
    so so
    please please ..reply as soon as u can sir.please….
    v v v v v v v vy urgent..

  33. Patrick says:

    Hi Dharma,

    Which code do you have? The code supplied above?
    The Exception you mention doesn’t seem to be related to that code as far as I can see. Is there some additional information?

    Cheers,

    Patrick

  34. dharmaraja.k says:

    Sir am having KMeansAction code which is u had posted…

    but am having the problem withe the exception thread…
    NOClassDefFoundError..
    GIve me the solutions sir v v vvvvvvvv urgent ….
    10 th project delivery..please help me..sir…

    or send me the procedures..
    in what name to be save and compile the program,,
    the class name..
    how to give input ..

  35. Patrick says:

    Dharma,

    I cannot give you a direct solution to all your issues. However, I’d suggest that you call your files KMeansAction.java and ClusterClass.java.

    You can also find the package as a whole on the download page, or just here

    It contains both those two classes as well as some code around them. As far as I know (at least back then) it didn’t contain errors.

    Hope this provides some more insights!

  36. vikas says:

    hey patrick….
    i wanted 2 knw hw 2 input d histogram in d kmeansaction class…. plz reply it asap its urgent………

  37. Patrick says:

    Vikas,

    The histogram is just an array with the pixel values as keys and the times they occur as values. As this operates on grey values, the R, G and B values are the same.
    If we take a 256 bit greyscale image, we have 256 colors: (0,0,0), (1,1,1), … , (255,255,255).
    The histogram will therefore contain keys 0 to 255. Each of the keys represents the grey value.

    example:
    histogram[0] = 25
    histogram[1] = 14

    histogram[255] = 3

    would mean there are 25 pixels of grey value (0,0,0), 14 pixels of grey value (1,1,1), …. , and 3 pixels which have grey value (255,255,255) in the image.

    Does this help you?

    Cheers,

    Patrick

  38. vikas says:

    hey thnx 4 ur reply…… bt i also want 2 knw hw 2 gv it as input in d kmeansaction class dat u hv in ur code….. hw 2 generate d histogram of an image nd den gv it as input????????

  39. Patrick says:

    How to generate a histogram:

    Create a histogram array with a bin (entry) for each pixel value
    foreach pixel in your image:
    Get the pixel value
    Increment the appropriate bin in the histogram array by one
    end foreach

    Now you have a histogram: just an array where the keys are the pixel values and the values are the times they occur.
    Provide this array to the kmeans action, alongside some vars (number of bins etc.) as mentioned above in the code and it will calculate the clusters.

  40. vikas says:

    hey i calculated the histogram as u told and histogram that has been calculated is ryt as i chckd it…… bt nw i m gtng d error dat KMeansAction cannot be instantiated…………. cn u plz explain hw 2 gt rid of dis error…… plz

  41. Patrick says:

    vikas,

    It probably has something to do with where you’ve put the class and whether or not you import it. Please have a look at your file structure and your import path. I’ve put my KMeansAction class in a folder / package called ‘actions’. If you have not done so, you should redefine this.

  42. vikas says:

    i changed the KMeansAction to place it in d same package Actions bt nw it showing an exception dat KMeansAction can’t b instantiated and applet is not initialized… plz reply soon……

  43. ishani says:

    hi pattrick,
    i am getting the following errors on compiling:

    C:\Ispe-dev\src>javac main\Ispe.java
    main\Ispe.java:33: package com.sun.media.imageioimpl.plugins.tiff does not exist

    import com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader;
    ^
    main\Ispe.java:34: package com.sun.media.imageioimpl.plugins.tiff does not exist

    import com.sun.media.imageioimpl.plugins.tiff.TIFFImageReaderSpi;
    ^
    main\Ispe.java:343: cannot find symbol
    symbol : class TIFFImageReaderSpi
    location: class main.Ispe
    TIFFImageReaderSpi tiffreaderspi = new TIFFImage
    ReaderSpi();
    ^
    main\Ispe.java:343: cannot find symbol
    symbol : class TIFFImageReaderSpi
    location: class main.Ispe
    TIFFImageReaderSpi tiffreaderspi = new TIFFImage
    ReaderSpi();
    ^
    main\Ispe.java:345: cannot find symbol
    symbol : class TIFFImageReader
    location: class main.Ispe
    TIFFImageReader tiffreader = new TIFFImageReader
    (tiffreaderspi);
    ^
    main\Ispe.java:345: cannot find symbol
    symbol : class TIFFImageReader
    location: class main.Ispe
    TIFFImageReader tiffreader = new TIFFImageReader
    (tiffreaderspi);
    ^
    6 errors

    what should i do now…???
    please help me out…i wud be very grateful to u….

  44. Patrick says:

    Hi Ishani,

    It’s clear that apparently the import is not valid any more. Which version of Java are you running?
    I haven’t checked lately if the TIFFImageReader is (re)moved in the latest Java versions.
    Please check which import causes the error. For a quick fix perhaps there are earlier versions of that import which you can supply locally. For a more persistent fix you should find the (new) correct path to the TIFFImageReader class.

    Cheers,

    Patrick

  45. ishani says:

    i m using jdk1.6.

  46. ishani says:

    pattrick,
    which version of java hv u used to run this program….???
    nd can u pls tell me how should i modify the code given by you in such a way that it does not deal with .tiff images…
    please reply asap because i have to submit my project by next week….
    i would be highly obliged…..

    regards
    ishani

  47. ishani says:

    hi pattrick,
    i have solved the previous problem of tiff package. and it is running well..thnx a lot for that… but now a new problem has cropped up..
    the window that appears on running the programs includes all the options like file,edit,view,help buttons etc but it is not showing the k-means menu button to carry out the k-means clustering operation…
    i am unable to figure out this problem ..the main purpose is to show k-means but it is not showing up…i dont know what went wrong…
    i hv downloaded a jar file jai_imageio-1.1 from the internet and set the classpath to this jar file ,so that the code can import all the tiff class files require..after that, the code is running well and displaying everything except the k-means option..
    what should i do now…????
    pls help me….

    regards
    ishani

  48. Patrick says:

    Hi Ishani,

    Glad you’ve figured that one out. It’s always tough to identify those kind of issues.
    When running the application, you should get the options ‘File’, ‘Edit’, ‘View’, ‘Action’, ‘Help’. Is this correct?

    Cheers,

    Patrick

  49. ishani says:

    hi pattrick,
    i figured out the problem…now its working ..thank u soo much for ur code….it works well ….thnk u sooo much…:):)..I owe u…
    now after this i have to implement image classification using artificial immune system and then compare its result with the k-means clustering method…
    so,do u hv any idea of how to do it with artificial immune system…??

    regards
    ishani

  50. Patrick says:

    Hi Ishani,

    No problem, glad it was helpful!
    I’m not familiar with that method. Do you have a link so I can check it out?

    Cheers,

    Patrick

  51. ishani says:

    here are some links to it:

    1.APPLICATIONS OF ARTIFICIAL IMMUNE SYSETMS IN REMOTE SENSING IMAGE …
    citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.5489

    2.An Unsupervised Artificial Immune Classifier for Multi …
    ieeexplore.ieee.org/iel5/36/33388/01580727.pdf?arnumber=1580727
    3.AIS
    ais.cs.memphis.edu/

    and i want to ask one more thing.In your code,u have assigned greyscale values to the clusters ,but if i want to assign rgb color values to the clusters ,then how should i modify the code????…and u have not used the objective function in ur kmeans code.??

  52. ishani says:

    hi pattrick,
    how should i change the dimensions of a histogram so that it fits on the screen??????

  53. Patrick says:

    Hi Ishani,

    The clusters currently work on greyscale images, by using the property that the R value is equal to the G value and the B value. So as you can see in the code, the R value is used.
    So basically the code works on redscale images ;-)
    You could of course adjust this part and by that get a different value which can be clustered.

    What’s the objective function you mentioned exactly?

    The dimensions of a histogram should be relative I think. I have encountered these issues as well, but didn’t fix them any more. (Shame on me). The dimensions of the histogram should be relative to the screen / drawing panel. Based on the dimensions of the drawing panel you should determine the scale for the histogram so that it fits the screen and preserves the histogram itself.

    I will have a look at the articles you provided when I got time to :)

    Cheers,

    Patrick

  54. ishani says:

    Hi pattrick,
    Thanx …nd see the following link for the objective function:
    home.dei.polimi.it/matteucc/Clustering/tutorial…/kmeans.html – 1.In the kmeans clustering problem,the aim is to minimize the objective function so that we get correct results.

  55. Seyi says:

    Hi Mr Patrick,
    I’m writing my project and I really need your help. I want to use clustering to segment teeth pixels in an image. I currently use ImageJ (open source GUI )and i want to extend its functionalities by writing additional plugins. I have some tiff images showing both teeth and gum and I need to isolate the teeth only…Is there any way you could help?? I will be very glad if you can kindly assist me + i’m way behind on my project

  56. Patrick says:

    Hi Seyi,

    How can I be of any help?
    Regarding the teeth images: which colors (and / or which range of colors?) do the teeth have in the image? And which color(s) (and / or range) does the gum have?

    Cheers,

    Patrick

  57. swathi says:

    hi ishani,
    i need source code for k-means algorithm in my project work for clustering the images(.jpeg or pgm) so please help me by giving the code that you have developed so far for your work please help me i need urgently for my project

  58. darsha says:

    hi sir.. can yo pls provide me the implementation of k-means algorithm for clustering large amount of data .. actu my proj s abt disease prediction. i just want to form 2 clusters .. ppl with disease n ppl dont hv disease based on the attributes.. can yo help me.. java code ll be much helpfull.. thank you sir

  59. Harry potter says:

    I want to do network intrusion detection using k-means. Can you provide me with a code?My input is a syslog log file in CSV format. I want cluster the logs into class 1 if a keyword anomaly is found. else into class 2.

  60. Hi darsha,

    First things first: you’ve got to think on which attributes you are going to cluster. As diseases may have many attributes, think of a good one which can really separate the diseases (clusters).
    The problem is actually not on the code. The theory is the most important part here.

    Cheers,

    Patrick

  61. Hi Harry potter (wow! THE Harry? ;-) )

    Your problem is quite an easy one at first sight, and I don’t know whether K-Means clustering is the best option for this. K-Means clustering clusters data in an iterative fashion. If you have a predefined set of words which are characteristic for network intrusion, you’re basically counting / separating.
    Can you explain what your method would look like? How would you do network intrusion detection step by step?

    Cheers,

    Patrick

  62. Harry potter says:

    Which algorithm will best suit this intrusion detection sir ? Currently I want to cluster into 2 groups based on simple keyword matching as I said above. How can the code be modified for real time network packets rather than log files?Can you give some useful links sir?

  63. Hmm.. that’s a good one. Clustering is mostly done on a group of data. What exactly do you want to establish with intrusion detection? Would you like to review every request and directly alarm in case of an intrusion? What does the intrusion detection flow look like?

  64. Harry potter says:

    Based on the criticality of the message we want to raise an alarm.(i.e)., we need to analyze all the logs and say whether any intrusion has occurred or not?

  65. nicky says:

    hi Patrick,
    If you don’t mind, could you also provide the images example and main class for testing?

    thanks anyway for the code :)

  66. Hi Nicky,

    I don’t know if I still have the images, but they were just 8-bit greyscale images. Any such type image would do.
    The rest of the source code can be found on the downloads page. It’s in the ‘Ispe development source code’ zipfile.

    Cheers,

    Patrick

  67. Hi Harry,

    Just saw that you’d posted a reply a while ago, my bad!
    Regarding the syslog file: it’s a CSV. Which columns does it have?

    K-means clustering would in your case just mean that you need to extract a the messages from the CSV file and classify them in two groups. Based on those two groups the real-time detection algorithm would decide whether a detection occurs based on the message.

    And how would you do this real-time? Would you capture the incoming package and check for particular words?

  68. Akash says:

    I want to convert grayescale image into binary image ..so can u suggest any solution..!

  69. Hi Akash, this is pretty straightforward. At some point in the image histogram you put the ‘splitting point’. Pixelvalues on the one side will become white and on the other side will become black. This is called thresholding. This is also implemented in the ISPE code which is available on my downloads page.

    Cheers,

    Patrick

Leave a Reply


eight − 1 =

preload preload preload