According to a report on electronic commerce, the sale of products online in the third quarter of 2022 increased by 16.8% compared to the third quarter of 2021.
This new way of consuming has led to the emergence of new strategies, some of which involve artificial intelligence. These machine learning algorithms can be used to increase sales, improve product organization and labeling, predict the customer journey, and certify the authenticity of branded goods sold online.
The applications of AI for fashion are numerous and respond to real needs, but these developments also come with their own technological challenges: read more.
Throughout the selected dataset, the shoes are centered on a white background and photographed in the same orientation. This can represent another bias in the training of the model. In order to reduce the impact of this bias, data augmentation is a possible solution, as it allows the classes to be rebalanced. This is what we have chosen and used on the training data. Data augmentation is a dataset improvement technique that uses a sample of images from under-represented classes to generate new images, thereby balancing the dataset. To perform data augmentation, there are a number of input transformation methods that can be used, such as:
Example of data augmentation In this case study, three model training approaches are discussed:
For all three models, the confusion matrices are approximately equivalent for all classes except for the Slippers class, for which there is a decrease in correctly classified images with data augmentation. This result is surprising because it is usually expected that data augmentation will improve the classification scores (also called accuracy). Yet the percentages of the slipper class decrease. Nevertheless, this result can be understood later with the help of the elements brought by the different analyses carried out. The point to bear in mind is that accuracy is not the only metric that must be taken into account to determine the performance of an algorithm. To continue the analysis, an image of the "Shoes" class is selected in order to visualize the results of Saimple for each model concerning relevance and dominance.
The results indicate that for the all-DA model the relevance seems to be much more accurate than for the other models, as shown by the difference in pixel concentrations. The same is true for the dominance of the all-DA model, which appears to be more stable, with the green line being further away from the blue lines of the other classes. The all-DA model seems to focus more on the contour of the shoe, which is the element considered to be the main one for the differentiation of the classes. Regarding dominance, the graphs indicate that all models correctly classified the image. But the dominance of the "Shoes" class is more distinct for the models that were trained with data augmentation. Indeed, for the all-DA and DA-slippers models, the dominance score for class 2 ("shoes") is further away from the dominance scores of the other classes than for the no-DA model. This first analysis allowed us to identify, with the help of relevance and dominance, that the models trained with data augmentation are in fact better at classifying this image, even if the accuracy may appear to be lower overall. Subsequently, Gaussian noise with different intensities is added to the example image to observe the behavior of the models in the face of perturbations.
Three levels of Gaussian noise are applied to the reference image. These intensity levels correspond to the different variances used to apply noise to the images. Intensity 0 corresponds to the original image. The three models are compared for each noise level and the Saimple results are examined to understand the difference between the three models.
For this first level of noise, the dominance results are similar to the previous results. The relevance is on the shoe for all three models but there is a small difference with the DA-slippers model. Indeed, the relevance for the latter is slightly concentrated on the background of the image. It is assumed that as the intensity of the noise increases, the models will focus more on the background of the image, which could change its classification.
The dominance indicates that the three models, as before, correctly classify the image. However, a difference is still observed, with this time the model without data augmentation having a much less precise relevance than the others. The dominance of the latter is also less distinct compared to the other models. For this level of noise intensity, the relevance produced by Saimple indicates that the models are starting to take the shoe background into account; except for the all-DA model which therefore appears to be more robust than the others.
At this level of noise, the first thing to note is the change in classification of the shoe in the case of the no-data augmentation model (represented by the orange circle). The dominance indicates that the no-Da model misclassifies the image as "slippers". For the DA-slippers model, the membership scores are also very close, which is a sign of significant risk of misclassification. For the all-DA model, the scores are still quite distinct. At this level of noise, the relevance of the no-DA model is totally scattered and no longer distinguishes the shoe, which explains why the model no longer classifies the image correctly. For the other two models, the relevance is on the shoe and slightly on the background, but the all-DA model remains less impacted by the application of noise. Thus, this level of noise intensity confirms the hypothesis, made during the evaluation at intensity level 0.005, that the all-DA model is more robust than the other models. To summarise, it can be noted that the algorithm having been trained on the dataset with data augmentation on all classes (all-DA) still seems to focus relatively well on the whole shoe. There does not appear to be any watermark or bias either. However, the noisier the image, the more difficult it is for the model to locate the features of the shoe that allow it to correctly classify the image, which is still consistent. This difficulty is confirmed by the dominance evaluation, which shows that the noisier the model, the greater the risk of overlapping classes (green line further and further away from 1) and therefore the more possible classification errors there will be. It should be noted that the dominance assessment offered by Saimple also makes it possible to obtain the exact noise threshold at which the model would no longer be considered sufficiently robust. It also offers the possibility of knowing precisely which classes are most likely to be confused. Thus, with the metrics obtained and according to the levels of satisfaction defined, it is possible to decide to change the parameters of the model or to re-train it by enriching the data set. New evaluations could then be carried out with Saimple in order to monitor the evolution of performance.
The graph above shows that, for the whole test data, the percentage of well classified images remains stable until intensity 0.001. However, after 0.001, the curves of the different classes tend to decrease slightly, except for the class "sandals". At the transition from intensity 0.005 to 0.025, the percentage of correct classifications for the "Boots" class decreases significantly, in contrast to the percentages for the other classes, from 85% to less than 50%. Only the curve for the "Sandals" class remains constant, indicating that the model is robust only for this class. Now let's analyze the confusion matrices:
The values of the curves correspond to the diagonals of these matrices. They show that the proportion of well classified images decreases as the noise intensity increases. A significant decrease can be seen in the boot class, where almost half of the images in this class are classified as slippers. The question may then arise: why does the model tend to confuse images of the "Boots" class with images of the "Slippers" class? This is not an easy question to answer, but relevance may allow elements of explicability to be introduced to enable understanding. However, this use case focuses mainly on robustness. Therefore, only hypotheses are made, but their verification is not dealt with in this use case. The most likely hypothesis is that the noisier the model, the less recognisable the features that help classification. Thus, the features that help to recognise the boots must be mainly affected, which greatly reduces the classification score. The feature that should be most resistant to noise should certainly be the shoe shape feature. However, slippers are the most classic shoe shape and can be considered as the basis for any other shoe. Therefore, as the features that allow classification are lost, the algorithm should no longer correctly recognise boots and therefore classify them as the "classic" shoe type, i.e. slippers.
As before, a significant decrease is observed from intensity 0.005 onwards, but unlike the no-DA model, the proportions of all classes except for the "Slippers" class decrease until they reach less than 50%. By analyzing this graph, it is clear that training the model by performing a data augmentation on a single class is not recommended. The confusion matrices confirm that data augmentation on only one class does not generally make the model more robust to noise, but tends to bias the classification towards the augmented class. Indeed, the intensity confusion matrix of 0.025 indicates that the model tends to classify the images of all classes as "slippers". This classification tends to confirm the hypothesis that the class "Slippers" is a basic class with a shape that can be found in all shoes. The fact that the data augmentation is performed only on this class also implies that the slippers will be found in many positions and that, overall, the more noise there is, the more the shoe will become a "stain", which will be closer to the simplest shape: the images of the "slippers" class. However, even if the stain is vertical, as in the case of the boots, there will also be images of the "Slippers" class which have undergone a rotation and for which the stains of the boots can therefore be approximated.
The graph, above, shows that the percentages of well ranked images remain stationary up to 0.005. After that, the curve for the "Slippers" class increases, which may seem a very curious result, while the curves for the other classes decrease slightly.
The confusion matrices, above, show a slight increase in the "Slippers" class images. But overall, with the model trained with data augmentation on all classes, the results are much better from an accuracy point of view. Again, the increase in the "Slippers" class can be linked to the hypothesis of deterioration in feature recognition, due to high noise levels. Now, let us study the stable space corresponding to the study of the delta max on the example image to visualize the robustness of the different models. The delta max refers to a particular delta value corresponding to the noise amplitude for which we can prove the dominance of a classifier network.
This graph shows the evolution of delta max values as a function of intensity for each model: no-DA, DA-Slippers and DA-all. For the example image, the delta max value is higher for the DA-all model, which implies that the DA-all model seems more robust than the other two models. It would be interesting to continue this study by performing a delta max search on the whole test data.
Picture credit :