Data Augmentation Automation for Images
Artificial intelligence (AI) and Machine Learning are here to stay. Paramount to their successful implementation is the amount of training data points. In practice, there is a need for large training datasets (millions of data) in order to arrive at accurate models.
Data in general is abundant. But for ML purposes this data needs to be tagged, that is, has to have labels indicating what the outcome of the eventual ML process will be. This process is expensive. That is the reason, we use Data Augmentation techniques. From an original dataset, we artificially create new tagged samples to ‘augment’ it.
The process
Here is a brief recount of the Data Augmentation automated process for image datasets.
- Establish the required amount of new data samples. Answer the question: From the available samples, how many new ones should be generated?
2. Establish the desired complexity of the Data Augmentation process.
3. Perform image processing. You can use basic techniques of image processing as described in the below paragraphs. Or you may try generation of new images with advanced techniques of data augmentation based on Generative Adversarial Neural Networks (GANs).
Some of the available techniques for the images processing are:
Flip
Horizontal or Vertical flip of the original image.
Crop
Sampling a section of the original image, scaling the new image to the original size.
Rotation
Rotation of the original image and scaling the new image to the original size.
Zoom (in / out)
Scaling the original image
Brightness / Contrast / Gamma / Hue / Saturation
Changing parameters related to color.
Color / Gray
Change from to Color to Grey scale.
Scale
Scale up or down the original image.
Gaussian Noise
Add Gaussian Noise to original Image.
You can find next a code example for you to try out. It uses the tf.image module of TensorFlow. You can check out another example using Keras tf.keras.preprocessing in one of my previous post titled Transfer Learning Ride. Enjoy!