Thursday, November 8, 2018

How to prevent tensorflow from monopolizing your system

Here we are, running a code block in Jupyter, my GPU RAM is just taken entirely for the sake of it, my 8GB RAM is being filled up completely, and swap starts increasing and the whole system slows down until the notebook is killed or in the worst case the system freezes completely.

It is always a good idea to constraint resources when running this kind of processes.

Concerning RAM, I am using the tensorflow-gpu docker image, so this is pretty easy to achieve:

docker run --runtime=nvidia -it -p 8888:8888 --memory="6g" --memory-swap="6g" tensorflow/tensorflow:1.5.0-gpu

The memory parameters sets the RAM limit while memory-swap sets the RAM+SWAP memory limit. By setting them to the same amount we can make sure no swap is used and the system will not be slowed down at wish.

As for GPU RAM, this is also pretty easily achievable, but from code.
When using Keras this can be done with:

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
# use only half of available GPU RAM
config.gpu_options.per_process_gpu_memory_fraction = 0.5
set_session(tf.Session(config=config))

This, for instance, sets the max GPU memory to 50% of its limit.

I suggest continuously monitoring CPU and GPU respectively using htop and nvtop:



However, we should also start wondering why this kind of situations happen, i.e. why do we need so much memory to do our data preparation. The answer is soon served. Instead of using "fit" and a numpy array, it is time to use Python generators to only load batch_size batches rather than preprocessing on the whole dataset directly in our RAM and use fit_generator. Surely this will nonetheless have the effect of increasing our disk I/O. Using efficient data structures such as HDF5 is therefore our duty.

In this Python notebook, I report an example of denoising autoencoder for time series data. Thereby, I use both the classic fit and the fit_generator interfaces, and show how to return batch_size windows entries for the latter, to minimize the amount of preprocessed data held in memory.

def batch_generator(seq_x, seq_y, batch_size, w_size=window_size, debug=False):
    # Create empty arrays to contain batch of features and labels
    batch_x = np.zeros((batch_size, w_size))
    batch_y = np.zeros((batch_size, w_size))

    loop = True
    while loop:
        for i in range(batch_size):
            # choose random window_size window from sequences
            index = np.random.randint(low=0, high=len(seq_x)-w_size, size=1)
            batch_x[i] = seq_x[ index[0]:index[0]+w_size ]
            batch_y[i] = seq_y[ index[0]:index[0]+w_size ]
        if debug: loop = False
        yield np.expand_dims(batch_x, axis=2), np.expand_dims(batch_y, axis=2)

The idea is to use a python generator (i.e. yield) to return a batch of (x,y) pairs in the data and target classes at each invocation of the function batch_generator, out of the datasets.

print windowize_generator(noisy_ts_train, window_size).shape
print windowize_generator(ts_train, window_size).shape

batch_size = 15
print np.array(list( 
            batch_generator(noisy_ts_train, ts_train, batch_size, window_size, debug=True)
        )).shape
(5401, 1800, 1)
(5401, 1800, 1)
(1, 2, 15, 1800, 1)

convolution_autoencoder.fit_generator(
    batch_generator(noisy_ts_train, ts_train, batch_size, window_size),
    verbose=1,
    steps_per_epoch=steps_per_epoch,
    nb_epoch=epochs)

Clearly, this does not consider the target classes in Y and could consequently lead to model biases. For the specific example, being it an autoencoder, this will depend pretty much on the underlying classes in X, i.e., for the disaggregation problem, we need to make sure a fair amount of OFF statuses is selected over the ON ones.

We provide another example here, this time for image processing. Have a look how we return resized images on the fly. To minimize the memory that would be necessary to load the entire dataset, we use Python generators to only return batch_size subsets of the dataset during the training process. To achieve this, we use the file names identified when splitting in test and train to move the original images and their related target (i.e., segmented class or object) to a directory structure as follows:
  • a "Preprocessed" folder is created as destination path for the preprocessed images
  • a "train" directory is created along with a "test" directory
  • for both "train" and "test" we have the respective "x" (i.e., original) and "y" (i.e., segmented) images
def get_imageset_batch_generator(x_path, y_path, resize_to, batch_size=1, seed=1234):
    """
        Return a generator for the raw images (x) and the target (y)
        Each time the function is created a new generator is created, to return batch_size images
    """
    # https://keras.io/preprocessing/image/#flow_from_directory
    print "Inspecting", x_path, "and", y_path
    datagenerator = ImageDataGenerator()
    #"categorical" will be 2D one-hot encoded labels,
    #"binary" will be 1D binary labels, 
    #"sparse" will be 1D integer labels,
    #"input" will be images identical to input images (mainly used to work with autoencoders).
    #"None, no labels are returned (just batching)
    x_gen = datagenerator.flow_from_directory(x_path, 
                                             target_size=(resize_to[1], resize_to[0]), 
                                              color_mode='rgb',
                                              classes=None, 
                                              class_mode=None,
                                              batch_size=batch_size, 
                                              shuffle=True, 
                                              seed=seed)
    y_gen = datagenerator.flow_from_directory(y_path, 
                                              target_size=(resize_to[1], resize_to[0]), 
                                              color_mode='rgb',
                                              classes=None, 
                                              class_mode=None,
                                              batch_size=batch_size, 
                                              shuffle=True, 
                                              seed=seed)
    return x_gen, y_gen

batch_size = 5
x_train_gen, y_train_gen = get_imageset_batch_generator(x_path=target_x_train, y_path=target_y_train,
                                                        resize_to=target_size,
                                                        batch_size=batch_size, 
                                                        seed=seed)
Inspecting data/VOCdevkit/VOC2012/Preprocessed/train/x and data/VOCdevkit/VOC2012/Preprocessed/train/y
Found 1951 images belonging to 1 classes.
Found 1951 images belonging to 1 classes.

Keras provides a built-in ImageDataGenerator, which requires a bunch of folders (one for each target class) to run the flow_from_directory method. This would solve the model bias issue described above, since it receives a list of classes and can be set to deal with the specific data type of the target variable. Since we however do not have specific classes for the image segmentation problem, we simply group them in an "all_classes" folder in each of the mentioned data folders (i.e., train/x, train/y, test/x, test/y). An alternative method available in Keras is the flow_from_dataframe for which an example is available here. The main difference is that flow_from_dataframe receives a pandas dataframe with one column listing the ids of the dataset items (i.e., file names)  and a column for the class the item belongs to. Those columns can be indicated using the parameters x_col and y_col.

It appears however, that the Generators do not provide any serialization guarantee on the provided data. The keras.utils.sequence (https://keras.io/utils/#sequence) offers a safer way to implement a data generator that can be parallelized using Python's multiprocessing, and consequently this guarantees that the network will only be trained once on each batch per epoch, which is not the case with the simpler python generator approach previously seen (see here. We provide an example for the Image Segmentation problem, similar to that at https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-flyThe sequence is provided with the list of filenames that constitute the set and a batch_size batch will be returned by loading the previously preprocessed images directly from disk.

from skimage.io import imread
from skimage.transform import resize

from keras.preprocessing.image import array_to_img, img_to_array, load_img
from keras.preprocessing.image import ImageDataGenerator

class ImageGenerator(keras.utils.Sequence):
    def __init__(self, 
                 paths,
                 filenames, 
                 input_size=(32,32,32),
                 output_size=(32,32,32),
                 batch_size=32, 
                 extension_x=".jpg",
                 extension_y=".png",
                 resize_to=None,
                 shuffle=True):
        self.paths = paths
        self.filenames = filenames
        self.input_size=input_size
        self.output_size=output_size
        self.batch_size = batch_size
        self.extension_x = extension_x
        self.extension_y = extension_y
        self.resize_to = resize_to
        self.shuffle = shuffle
        self.on_epoch_end()
    
    def on_epoch_end(self):
        """
            Triggered before and after each epoch, updates index order
        """
        self.indexes = np.arange(len(self.filenames))
        if self.shuffle == True:
            np.random.shuffle(self.indexes)
            
    def __len__(self):
        """
            To make sure the whole dataset is used for each epoch.
            The number of batches is the number of data entries over the batch size
        """
        #return int(np.floor(len(self.filenames) / self.batch_size))
        #return int(np.ceil(len(self.filenames) / float(self.batch_size)))
        return 1
    
    def __getitem__(self, index):
        """Return one data batch"""
        # Randomly select a batch of indexes from the index list
        indexes = self.indexes[index*self.batch_size : (index+1)*self.batch_size]
        # Find filenames having those indexes
        filenames_batch = [self.filenames[k] for k in indexes]
        # Generate one batch for those filenames
        X, y = self.__data_generation(filenames_batch)
        return X, y
    
    def __load_and_resize__(self, path):
        # mode 1
        img_tmp = imread(path)
        if self.resize_to is not None:
            img_tmp = resize(img_tmp, self.resize_to)
        return img_tmp
        
        """
        # mode 2
        # this seems to be lot slower than mode 1, no clue why, it is even keras based
        img_tmp = load_img( path )
        if self.resize_to is not None:
            img_tmp = img_tmp.resize((self.resize_to[0], self.resize_to[1]))
        return img_to_array( img )
        """
        
        """
        # mode 3
        img_tmp = tf.image.decode_jpeg(path)
        print "img_type", type(img_tmp)
        resized_img = tf.image.resize_images(img_tmp, (self.resize_to[0], self.resize_to[1]))
        print "resized_img_type", type(resized_img)
        resized_img = tf.cast(resized_img, np.uint8).eval()
        #return img_to_array( resized_img )
        return resized_img
        """        
    
    def __data_generation(self, filenames_temp):
        """Generates batch_size samples """
        # generate resulting data with shape
        # (batch_size, input_size[:]) and (batch_size, output_size[:])
        x = np.empty((self.batch_size,)+ self.input_size)
        y = np.empty((self.batch_size,)+ self.output_size)

        # Load data from the filename list for both source and targe data
        for i, filename in enumerate(filenames_temp):
            # load and resize source image
            x[i] = self.__load_and_resize__(os.path.join(self.paths["x"], filename+ self.extension_x))

            # load and resize target image
            y[i] = self.__load_and_resize__(os.path.join(self.paths["y"], filename+ self.extension_y))

        return x, y
You can see below the different usage between the two approaches:

if train_using_generator:
    if use_image_generator:
        batch_size = 2
        epochs = 10
        training_generator = ImageGenerator(paths_train, train, batch_size=batch_size, 
                                            input_size=(img_h, img_w, 3),
                                            output_size=(img_h, img_w, 3),
                                            resize_to=(img_h, img_w),
                                            shuffle=True)
        validation_generator = ImageGenerator(paths_test, test, batch_size=batch_size, 
                                              input_size=(img_h, img_w, 3),
                                              output_size=(img_h, img_w, 3),
                                              resize_to=(img_h, img_w),
                                              shuffle=True)
        # Train model on dataset
        model.fit_generator(generator=training_generator,
                            validation_data=validation_generator,
                            epochs=epochs,
                            use_multiprocessing=True,
                            workers=6)
    else:
        batch_size = 2
        # train generator
        x_train_gen, y_train_gen = get_imageset_batch_generator(x_path=target_x_train, y_path=target_y_train,
                                                                resize_to=target_size,
                                                                batch_size=batch_size, seed=seed)
        train_generator = zip_xy_gens(x_train_gen, y_train_gen)
        # validation generator
        x_test_gen, y_test_gen = get_imageset_batch_generator(x_path=target_x_test, y_path=target_y_test,
                                                              resize_to=target_size,
                                                              batch_size=batch_size, seed=seed)
        test_generator = zip_xy_gens(x_test_gen, y_test_gen)
        model.fit_generator(train_generator,
                            steps_per_epoch=1,
                            validation_data=test_generator,
                            validation_steps=1,
                            epochs=epochs,
                            #verbose=2
                           )


Hope this helps somebody.
Andrea

Useful Links:
  • https://medium.com/@fromtheast/implement-fit-generator-in-keras-61aa2786ce98
  • https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly 



No comments:

Post a Comment