
ALwEN - To make Sense of itOne goal Almende has for a Wireless Sensor Network platform is to have the system be able to find patterns in sensor data. Pattern detection is a problem that can be defined in different levels of complexity, and we define it in quite a complex way: unsupervised, multi-modal, temporal, distributed pattern detection, though the unsupervised part really only is "up to a degree". What does all this mean? The problem of pattern detection basically means to find recurring patterns in a set of data. This data can come in batches, where time or sequentiality is not a factor, such as in image classification, where the entire image comes at a time, and the sequence of images presented to classify carries no meaning. Alternatively, the sequence in which data comes in could be of importance, such as in the classification of strings of DNA. Then, data could be temporal, where the actual time that data comes in is important. This is for instance the case in the human brain, where the exact timing of spikes carries meaning, rather than for instance an average amount of spikes per time period. So temporality is one dimension along which different levels of complexity can be achieved; another is modality. If your data is all of one modality, such as pictures, sound, type of protein, etc., detecting patterns is relatively straightforward. However, if your data is of multiple modalities, this is not true. Consider, for instance, a robot trying to make sense of the sounds coming into its microphone and the images coming into its camera. How can it link the image of a dog to the barking the dog produces? And is it capable to deduce, from just the sound of a dog, that a dog is nearby, even though it can't see the dog? A third dimension is distributedness. Traditionally, it is assumed that all data is present in one physical location, where all data can be used to train and classify. Training becomes a lot trickier if the data is distributed over multiple physical locations. Do you opt for gathering all data at some central destination first and then training one classifier? Or do you let the nodes carrying the data organize into clusters and elect clusterheads to perform the training? Or, even fuzzier, do you let each node train using the data it can gather? And how then, do you come to a final decision? The fourth dimension is degree of supervision: traditional machine learning methods rely heavily on supervision by an expert. You present them an image, the algorithm attempts to classify, and you provide it feedback on how it did. The algorithm then uses this feedback to make a better classification next time. However, things get a lot more complex if this feedback is not, or to a lesser degree, present. It will be up to the algorithm to for instance make a number of clusters of similar data, and classify according to this. A typical example of what we would like our sensor network to be able to do is the following. Suppose a toilet is outfitted with three nodes, one capable of measuring whether the door is open or closed, one measuring the light, and one measuring water current. The network should then be able to find a typical pattern of someone going to the toilet: door open, light on, door closed, water flowing, door open, lights off, door closed. It should do this by noting that this pattern is recurring, without any a priori knowledge that this kind of pattern might occur, particularly no idea of time scale, or network topology. It should them be able to present this found pattern to a user, asking whether this is indeed an interesting pattern, and asking to label it. It is easy to see that this scenario is complex in each of the four dimensions described earlier: our data is temporal in nature, it is multi-modal, it is distributed, and to some extend unsupervised. Coming up with some way to do it is the big challenge. You must be logged in to make comments on this site - please log in, or if you are not registered click here to signup |