Working with High-Dimensional Data Part 2: Classification by Cluster Analysis
November 28, 2017Working with High-Dimensional Data Part 4: Classifying Unknown Samples using Machine Learning Principles
December 12, 2017
In part 1 of this introductory series about working with high-dimensional data we looked at dimensionality reduction to allow the visualisation of complex data. Part 2 of the series explored the K-means clustering method as a technique to classify samples according to their geochemical characteristics. In this article we place the classified samples back into their geospatial context and use the spatial relationships between the clusters to define a behaviour profile, which is aimed at guiding critical mine planning decisions.
The example we’ve been developing in part 1 and part 2 of this series is based on chemical assays for four elements from a resource feasibility study. The first data processing step involved dimensionality reduction using multi-dimensional scaling to produce a 2D representation of the original high-dimensional chemical assay data. We then used the 2D data as input for a multi-step cluster analysis based on the K-means clustering method to classify geometallurgical domains (figure 1).
Figure 2 below is an interactive 3D graph showing the samples/clusters in their geospatial context. Samples that belong to the same geometallurgical domains have been given the same colour (see legend). It is clear that some samples that are classified into one clusters are spatially closer to or surrounded by samples from other domains. In terms of practical mining one has to consider the realistic range of characteristics of the raw material extracted from any given area of the resource. In this instance we use a simple proximity measure to reassign samples that are essentially isolated from their domains to the dominant geometallurgical domain in that area (figure 3). This process increases the geochemical variance of the domains, but also provides a more realistic estimate of feed material characteristics from the mining process.
Note: use basic mouse controls to interact with the graphs; select or deselect domains in the legend.
The data in figures 2 and 3 tell a relatively simple story. The resource is divided into four main geometallurgical domains. Domain 0 is the most valuable in this resource and occurs predominantly at greater depths than the other clusters. Domain 3 is the least attractive in terms of its value and mineral processing complexity, while domains 1 and 2 might be considered intermediate domains in terms of their value and processing requirements. Geospatially the domains are fairly well defined, which paves the way to a mining strategy that might involve initially avoiding the areas where domain 3 is dominant and rather targeting locations where domains 1 and 2 are easily accessible. It might be that the operation needs to be careful with cash flow management during the early stages of the development; however, future access to domain 3 will project an upturn in cashflow at some later stage of the operation.
Parts 1, 2, and 3 in this blog series have introduced some of the basic principles of machine learning and its application in resource characterisation. The geochemical domains in figures 1, 2, and 3 above represent the “training data”, and in the next article (part 4) we will explore how we can use the training data to make predictions about unclassified “test samples” to estimate their metallurgical behaviour.