{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "c831b6f8-79ef-4d1b-b963-47422c056731", "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet seaborn pandas scikit-learn numpy matplotlib jupyterlab_myst ipython" ] }, { "cell_type": "markdown", "id": "1e62993e", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "id": "bfee6a78", "metadata": {}, "source": [ "# Image segmentation" ] }, { "cell_type": "markdown", "id": "6eb70ed9", "metadata": {}, "source": [ "## What is Image segmentation?" ] }, { "cell_type": "markdown", "id": "4556a6ff", "metadata": {}, "source": [ "In an image segmentation task (specifically semantic segmentation), the network assigns a label (or class) to each input image. However, suppose you want to know the shape of that object, which pixel belongs to which object, etc. In this case, you need to assign a class to each pixel of the image—this task is known as segmentation. A segmentation model returns much more detailed information about the image. Image segmentation has many applications in medical imaging, self-driving cars and satellite imaging, just to name a few." ] }, { "cell_type": "code", "execution_count": 2, "id": "0bc7ea14", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "\n", "A demo of image segmentation 1. [source]\n", "

\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "

\n", "\n", "A demo of image segmentation 1. [source]\n", "

\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 3, "id": "479cbe5b", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "\n", "A demo of image segmentation 2. [source]\n", "

\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "

\n", "\n", "A demo of image segmentation 2. [source]\n", "

\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "id": "92c12424", "metadata": {}, "source": [ "## How to train a model for image segmentation?" ] }, { "cell_type": "markdown", "id": "87abaf96", "metadata": {}, "source": [ "This tutorial uses the Oxford-IIIT Pet Dataset. The dataset consists of images of 37 pet breeds, with 200 images per breed (~100 each in the training and test splits). Each image includes the corresponding labels, and pixel-wise masks. The masks are class-labels for each pixel. Each pixel is given one of three categories:\n", "\n", "- Class 1: Pixel belonging to the pet.\n", "- Class 2: Pixel bordering the pet.\n", "- Class 3: None of the above/a surrounding pixel.\n", "\n", "And it can be downloaded from [The Oxford-IIIT Pet Dataset](https://academictorrents.com/details/b18bbd9ba03d50b0f7f479acc9f4228a408cecc1)." ] }, { "cell_type": "code", "execution_count": 5, "id": "543eb431", "metadata": { "tags": [] }, "outputs": [], "source": [ "import tensorflow as tf\n", "from tensorflow import keras\n", "import tensorflow_datasets as tfds\n", "\n", "from IPython.display import clear_output\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "id": "96ce6010", "metadata": {}, "source": [ "### Prepare the Oxford-IIIT Pets dataset" ] }, { "cell_type": "code", "execution_count": 6, "id": "3013b20e", "metadata": { "tags": [] }, "outputs": [], "source": [ "dataset, info = tfds.load(\"oxford_iiit_pet:3.*.*\", with_info=True)" ] }, { "cell_type": "markdown", "id": "620b3181", "metadata": {}, "source": [ "In addition, the image color values are normalized to the range [0, 1]. Finally, as mentioned above the pixels in the segmentation mask are labeled either {1, 2, 3}. For the sake of convenience, subtract 1 from the segmentation mask, resulting in labels that are : {0, 1, 2}." ] }, { "cell_type": "code", "execution_count": 7, "id": "e9fc90e7", "metadata": { "tags": [] }, "outputs": [], "source": [ "def normalize(input_image, input_mask):\n", " input_image = tf.cast(input_image, tf.float32) / 255.0\n", " input_mask -= 1\n", " return input_image, input_mask\n", "\n", "\n", "def load_image(datapoint):\n", " input_image = tf.image.resize(datapoint[\"image\"], (128, 128))\n", " input_mask = tf.image.resize(\n", " datapoint[\"segmentation_mask\"],\n", " (128, 128),\n", " method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,\n", " )\n", "\n", " input_image, input_mask = normalize(input_image, input_mask)\n", "\n", " return input_image, input_mask" ] }, { "cell_type": "markdown", "id": "0389a2e1", "metadata": {}, "source": [ "The dataset already contains the required training and test splits, so continue to use the same splits:" ] }, { "cell_type": "code", "execution_count": 8, "id": "c333e38f", "metadata": { "tags": [] }, "outputs": [], "source": [ "TRAIN_LENGTH = info.splits[\"train\"].num_examples\n", "BATCH_SIZE = 64\n", "BUFFER_SIZE = 1000\n", "STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE\n", "train_images = dataset[\"train\"].map(load_image, num_parallel_calls=tf.data.AUTOTUNE)\n", "test_images = dataset[\"test\"].map(load_image, num_parallel_calls=tf.data.AUTOTUNE)" ] }, { "cell_type": "markdown", "id": "863b8813", "metadata": {}, "source": [ "The following class performs a simple augmentation by randomly-flipping an image. Go to the Image augmentation tutorial to learn more." ] }, { "cell_type": "code", "execution_count": 9, "id": "18ec715e", "metadata": { "tags": [] }, "outputs": [], "source": [ "class Augment(tf.keras.layers.Layer):\n", " def __init__(self, seed=42):\n", " super().__init__()\n", " # both use the same seed, so they'll make the same random changes.\n", " self.augment_inputs = tf.keras.layers.RandomFlip(mode=\"horizontal\", seed=seed)\n", " self.augment_labels = tf.keras.layers.RandomFlip(mode=\"horizontal\", seed=seed)\n", "\n", " def call(self, inputs, labels):\n", " inputs = self.augment_inputs(inputs)\n", " labels = self.augment_labels(labels)\n", " return inputs, labels" ] }, { "cell_type": "markdown", "id": "4ca9679f", "metadata": {}, "source": [ "Build the input pipeline, applying the augmentation after batching the inputs:" ] }, { "cell_type": "code", "execution_count": 11, "id": "257dd4fd", "metadata": { "tags": [] }, "outputs": [], "source": [ "train_batches = (\n", " train_images.cache()\n", " .shuffle(BUFFER_SIZE)\n", " .batch(BATCH_SIZE)\n", " .repeat()\n", " .map(Augment())\n", " .prefetch(buffer_size=tf.data.AUTOTUNE)\n", ")\n", "\n", "test_batches = test_images.batch(BATCH_SIZE)\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "id": "276f1976", "metadata": {}, "source": [ "Visualize an image example and its corresponding mask from the dataset:" ] }, { "cell_type": "code", "execution_count": 12, "id": "337a3ade-b648-4a53-8139-b9128cefdf4c", "metadata": { "tags": [] }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def display(display_list):\n", " plt.figure(figsize=(15, 15))\n", "\n", " title = [\"Input Image\", \"True Mask\", \"Predicted Mask\"]\n", "\n", " for i in range(len(display_list)):\n", " plt.subplot(1, len(display_list), i + 1)\n", " plt.title(title[i])\n", " plt.imshow(tf.keras.utils.array_to_img(display_list[i]))\n", " plt.axis(\"off\")\n", " plt.show()\n", "\n", "\n", "for images, masks in train_batches.take(2):\n", " sample_image, sample_mask = images[0], masks[0]\n", " display([sample_image, sample_mask])" ] }, { "cell_type": "markdown", "id": "75537524", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/imgseg/01_display_dataset.png\n", "---\n", "name: '01_dataset_example_mask' \n", "width: 90%\n", "---\n", "An example from the dataset\n", ":::" ] }, { "cell_type": "markdown", "id": "85e667c5", "metadata": {}, "source": [ "### Define the model" ] }, { "cell_type": "markdown", "id": "5f35f9de", "metadata": {}, "source": [ "The model being used here is a modified U-Net. A U-Net consists of an encoder (downsampler) and decoder (upsampler). To learn robust features and reduce the number of trainable parameters, use a pretrained model—MobileNetV2—as the encoder. For the decoder, you will use the upsample block, which is already implemented in the pix2pix example in the TensorFlow Examples repo." ] }, { "cell_type": "markdown", "id": "742dcf22", "metadata": {}, "source": [ ":::{note}\n", "The link for paper is https://arxiv.org/pdf/1505.04597.pdf.\n", ":::" ] }, { "cell_type": "markdown", "id": "cd227d99", "metadata": {}, "source": [ "As mentioned, the encoder is a pretrained MobileNetV2 model. You will use the model from tf.keras.applications. The encoder consists of specific outputs from intermediate layers in the model. Note that the encoder will not be trained during the training process." ] }, { "cell_type": "code", "execution_count": 14, "id": "08769272", "metadata": { "tags": [] }, "outputs": [], "source": [ "base_model = tf.keras.applications.MobileNetV2(\n", " input_shape=[128, 128, 3], include_top=False\n", ")\n", "\n", "# Use the activations of these layers\n", "layer_names = [\n", " \"block_1_expand_relu\", # 64x64\n", " \"block_3_expand_relu\", # 32x32\n", " \"block_6_expand_relu\", # 16x16\n", " \"block_13_expand_relu\", # 8x8\n", " \"block_16_project\", # 4x4\n", "]\n", "base_model_outputs = [base_model.get_layer(name).output for name in layer_names]\n", "\n", "# Create the feature extraction model\n", "down_stack = tf.keras.Model(inputs=base_model.input, outputs=base_model_outputs)\n", "\n", "down_stack.trainable = False\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "id": "cf7ce2a2", "metadata": {}, "source": [ "The decoder/upsampler is simply a series of upsample blocks implemented in TensorFlow examples:" ] }, { "cell_type": "code", "execution_count": 15, "id": "69c5e34d", "metadata": { "tags": [] }, "outputs": [], "source": [ "from tensorflow.keras import layers\n", "\n", "\n", "def upsample(filters, size, apply_dropout=False):\n", " initializer = tf.random_normal_initializer(0.0, 0.02)\n", " result = tf.keras.Sequential()\n", " result.add(\n", " layers.Conv2DTranspose(\n", " filters,\n", " size,\n", " strides=2,\n", " padding=\"same\",\n", " kernel_initializer=initializer,\n", " use_bias=False,\n", " )\n", " )\n", " result.add(layers.BatchNormalization())\n", " if apply_dropout:\n", " result.add(layers.Dropout(0.5))\n", " result.add(layers.ReLU())\n", " return result\n", "\n", "\n", "up_stack = [\n", " upsample(512, 3), # 4x4 -> 8x8\n", " upsample(256, 3), # 8x8 -> 16x16\n", " upsample(128, 3), # 16x16 -> 32x32\n", " upsample(64, 3), # 32x32 -> 64x64\n", "]" ] }, { "cell_type": "code", "execution_count": 16, "id": "abdf340c", "metadata": { "tags": [] }, "outputs": [], "source": [ "def unet_model(output_channels: int):\n", " inputs = tf.keras.layers.Input(shape=[128, 128, 3])\n", "\n", " # Downsampling through the model\n", " skips = down_stack(inputs)\n", " x = skips[-1]\n", " skips = reversed(skips[:-1])\n", "\n", " # Upsampling and establishing the skip connections\n", " for up, skip in zip(up_stack, skips):\n", " x = up(x)\n", " concat = tf.keras.layers.Concatenate()\n", " x = concat([x, skip])\n", "\n", " # This is the last layer of the model\n", " last = tf.keras.layers.Conv2DTranspose(\n", " filters=output_channels, kernel_size=3, strides=2, padding=\"same\"\n", " ) # 64x64 -> 128x128\n", "\n", " x = last(x)\n", "\n", " return tf.keras.Model(inputs=inputs, outputs=x)" ] }, { "cell_type": "markdown", "id": "f3f75a52", "metadata": {}, "source": [ "Note that the number of filters on the last layer is set to the number of it. This will be one output channel per class.output_channels." ] }, { "cell_type": "markdown", "id": "3b16ba3a", "metadata": {}, "source": [ "### Train the model" ] }, { "cell_type": "markdown", "id": "3b089cc9", "metadata": {}, "source": [ "Now, all that is left to do is to compile and train the model. Since this is a multiclass classification problem, use the 'tf.keras.losses.CategoricalCrossentropy' loss function with the argument set to , since the labels are scalar integers instead of vectors of scores for each pixel of every class.from_logitsTrue. When running inference, the label assigned to the pixel is the channel with the highest value. This is what the function is doing.create_mask." ] }, { "cell_type": "code", "execution_count": 18, "id": "20c83f6b", "metadata": { "tags": [] }, "outputs": [], "source": [ "OUTPUT_CLASSES = 3\n", "\n", "model = unet_model(output_channels=OUTPUT_CLASSES)\n", "model.compile(\n", " optimizer=\"adam\",\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", " metrics=[\"accuracy\"],\n", ")\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "id": "f0ab0185", "metadata": {}, "source": [ "Try out the model to check what it predicts before training:" ] }, { "cell_type": "code", "execution_count": 19, "id": "e13f2332", "metadata": { "tags": [] }, "outputs": [], "source": [ "def create_mask(pred_mask):\n", " pred_mask = tf.math.argmax(pred_mask, axis=-1)\n", " pred_mask = pred_mask[..., tf.newaxis]\n", " return pred_mask[0]\n", "\n", "\n", "def show_predictions(dataset=None, num=1):\n", " if dataset:\n", " for image, mask in dataset.take(num):\n", " pred_mask = model.predict(image)\n", " display([image[0], mask[0], create_mask(pred_mask)])\n", " else:\n", " display(\n", " [\n", " sample_image,\n", " sample_mask,\n", " create_mask(model.predict(sample_image[tf.newaxis, ...])),\n", " ]\n", " )" ] }, { "cell_type": "markdown", "id": "528aa0c5", "metadata": {}, "source": [ "Perhaps now we can try to show the predictions." ] }, { "cell_type": "code", "execution_count": 25, "id": "296c64df", "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1/1 [==============================] - 0s 51ms/step\n" ] }, { "data": { "text/plain": [ "[,\n", " ,\n", " ]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for images, masks in train_batches.take(1):\n", " sample_image, sample_mask = images[0], masks[0]\n", "show_predictions()" ] }, { "cell_type": "code", "execution_count": 26, "id": "2e3d94b1", "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1/1 [==============================] - 0s 77ms/step\n" ] }, { "data": { "text/plain": [ "[,\n", " ,\n", " ]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sample Prediction after epoch 20\n", "\n", "57/57 [==============================] - 55s 976ms/step - loss: 0.0978 - accuracy: 0.9591 - val_loss: 0.3668 - val_accuracy: 0.8986\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, HTML\n", "\n", "display(HTML(\"\"))\n", "\n", "\n", "class DisplayCallback(tf.keras.callbacks.Callback):\n", " def on_epoch_end(self, epoch, logs=None):\n", " clear_output(wait=True)\n", " show_predictions()\n", " print(\"\\nSample Prediction after epoch {}\\n\".format(epoch + 1))\n", "\n", "\n", "EPOCHS = 20\n", "VAL_SUBSPLITS = 5\n", "VALIDATION_STEPS = info.splits[\"test\"].num_examples // BATCH_SIZE // VAL_SUBSPLITS\n", "\n", "model_history = model.fit(\n", " train_batches,\n", " epochs=EPOCHS,\n", " steps_per_epoch=STEPS_PER_EPOCH,\n", " validation_steps=VALIDATION_STEPS,\n", " validation_data=test_batches,\n", " callbacks=[DisplayCallback()],\n", " verbose=1,\n", ")\n", "\n", "loss = model_history.history[\"loss\"]\n", "val_loss = model_history.history[\"val_loss\"]\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "plt.figure()\n", "plt.plot(model_history.epoch, loss, \"r\", label=\"Training loss\")\n", "plt.plot(model_history.epoch, val_loss, \"bo\", label=\"Validation loss\")\n", "plt.title(\"Training and Validation Loss\")\n", "plt.xlabel(\"Epoch\")\n", "plt.ylabel(\"Loss Value\")\n", "plt.ylim([0, 1])\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "6ce73353", "metadata": {}, "source": [ "### Make predictions" ] }, { "cell_type": "markdown", "id": "e3d0d488", "metadata": {}, "source": [ "Now, let's make some predictions. In the interest of saving time, the number of epochs was kept small, but you may set this higher to achieve more accurate results." ] }, { "cell_type": "markdown", "id": "7b384977", "metadata": {}, "source": [ "show_predictions(test_batches, 3)" ] }, { "cell_type": "markdown", "id": "e83f95d7", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/imgseg/02_show_prediction.png\n", "---\n", "name: '02_display_prediction' \n", "width: 90%\n", "---\n", "An example of the output prediction\n", ":::" ] }, { "cell_type": "markdown", "id": "bd1c4121", "metadata": {}, "source": [ "## History & classic models" ] }, { "cell_type": "markdown", "id": "91455f0f", "metadata": {}, "source": [ "In the previous part, we learn the U-Net model. Besides, there are still many excellent models, we will introduce them is this part." ] }, { "cell_type": "markdown", "id": "90adb67d", "metadata": {}, "source": [ "### Code for assisting" ] }, { "cell_type": "code", "execution_count": 27, "id": "12ec8599", "metadata": { "tags": [] }, "outputs": [], "source": [ "import tensorflow as tf\n", "\n", "layers = tf.keras.layers\n", "backend = tf.keras.backend\n", "\n", "\n", "class ResNet(object):\n", " def __init__(self, version=\"ResNet50\", dilation=None, **kwargs):\n", " \"\"\"\n", " The implementation of ResNet based on Tensorflow.\n", " :param version: 'ResNet50', 'ResNet101' or 'ResNet152'\n", " :param dilation: Whether to use dilation strategy\n", " :param kwargs: other parameters.\n", " \"\"\"\n", " super(ResNet, self).__init__(**kwargs)\n", " params = {\n", " \"ResNet50\": [2, 3, 5, 2],\n", " \"ResNet101\": [2, 3, 22, 2],\n", " \"ResNet152\": [2, 7, 35, 2],\n", " }\n", " self.version = version\n", " assert version in params\n", " self.params = params[version]\n", "\n", " if dilation is None:\n", " self.dilation = [1, 1]\n", " else:\n", " self.dilation = dilation\n", " assert len(self.dilation) == 2\n", "\n", " def _identity_block(\n", " self, input_tensor, kernel_size, filters, stage, block, dilation=1\n", " ):\n", " \"\"\"The identity block is the block that has no conv layer at shortcut.\n", " # Arguments\n", " input_tensor: input tensor\n", " kernel_size: default 3, the kernel size of\n", " middle conv layer at main path\n", " filters: list of integers, the filters of 3 conv layer at main path\n", " stage: integer, current stage label, used for generating layer names\n", " block: 'a','b'..., current block label, used for generating layer names\n", " # Returns\n", " Output tensor for the block.\n", " \"\"\"\n", " filters1, filters2, filters3 = filters\n", " if backend.image_data_format() == \"channels_last\":\n", " bn_axis = 3\n", " else:\n", " bn_axis = 1\n", "\n", " if block > \"z\":\n", " block = chr(ord(block) - ord(\"z\") + ord(\"A\") - 1)\n", "\n", " conv_name_base = \"res\" + str(stage) + block + \"_branch\"\n", " bn_name_base = \"bn\" + str(stage) + block + \"_branch\"\n", "\n", " x = layers.Conv2D(\n", " filters1, (1, 1), kernel_initializer=\"he_normal\", name=conv_name_base + \"2a\"\n", " )(input_tensor)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2a\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters2,\n", " kernel_size,\n", " padding=\"same\",\n", " kernel_initializer=\"he_normal\",\n", " name=conv_name_base + \"2b\",\n", " dilation_rate=dilation,\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2b\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters3, (1, 1), kernel_initializer=\"he_normal\", name=conv_name_base + \"2c\"\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2c\")(x)\n", "\n", " x = layers.add([x, input_tensor])\n", " x = layers.Activation(\"relu\")(x)\n", " return x\n", "\n", " def _conv_block(\n", " self,\n", " input_tensor,\n", " kernel_size,\n", " filters,\n", " stage,\n", " block,\n", " strides=(2, 2),\n", " dilation=1,\n", " ):\n", " \"\"\"A block that has a conv layer at shortcut.\n", " # Arguments\n", " input_tensor: input tensor\n", " kernel_size: default 3, the kernel size of\n", " middle conv layer at main path\n", " filters: list of integers, the filters of 3 conv layer at main path\n", " stage: integer, current stage label, used for generating layer names\n", " block: 'a','b'..., current block label, used for generating layer names\n", " strides: Strides for the first conv layer in the block.\n", " # Returns\n", " Output tensor for the block.\n", " Note that from stage 3,\n", " the first conv layer at main path is with strides=(2, 2)\n", " And the shortcut should have strides=(2, 2) as well\n", " \"\"\"\n", " filters1, filters2, filters3 = filters\n", " if backend.image_data_format() == \"channels_last\":\n", " bn_axis = 3\n", " else:\n", " bn_axis = 1\n", " conv_name_base = \"res\" + str(stage) + block + \"_branch\"\n", " bn_name_base = \"bn\" + str(stage) + block + \"_branch\"\n", "\n", " strides = (1, 1) if dilation > 1 else strides\n", "\n", " x = layers.Conv2D(\n", " filters1,\n", " (1, 1),\n", " strides=strides,\n", " kernel_initializer=\"he_normal\",\n", " name=conv_name_base + \"2a\",\n", " )(input_tensor)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2a\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters2,\n", " kernel_size,\n", " padding=\"same\",\n", " kernel_initializer=\"he_normal\",\n", " name=conv_name_base + \"2b\",\n", " dilation_rate=dilation,\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2b\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters3, (1, 1), kernel_initializer=\"he_normal\", name=conv_name_base + \"2c\"\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2c\")(x)\n", "\n", " shortcut = layers.Conv2D(\n", " filters3,\n", " (1, 1),\n", " strides=strides,\n", " kernel_initializer=\"he_normal\",\n", " name=conv_name_base + \"1\",\n", " )(input_tensor)\n", " shortcut = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"1\")(\n", " shortcut\n", " )\n", "\n", " x = layers.add([x, shortcut])\n", " x = layers.Activation(\"relu\")(x)\n", " return x\n", "\n", " def __call__(self, inputs, output_stages=\"c5\", **kwargs):\n", " \"\"\"\n", " call for ResNet50, ResNet101 or ResNet152.\n", " :param inputs: a 4-D tensor.\n", " :param output_stages: str or a list of str containing the output stages.\n", " :param kwargs: other parameters.\n", " :return: the output of different stages.\n", " \"\"\"\n", " if backend.image_data_format() == \"channels_last\":\n", " bn_axis = 3\n", " else:\n", " bn_axis = 1\n", "\n", " dilation = self.dilation\n", "\n", " x = layers.ZeroPadding2D(padding=(3, 3), name=\"conv1_pad\")(inputs)\n", " x = layers.Conv2D(\n", " 64,\n", " (7, 7),\n", " strides=(2, 2),\n", " padding=\"valid\",\n", " kernel_initializer=\"he_normal\",\n", " name=\"conv1\",\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=\"bn_conv1\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", " x = layers.ZeroPadding2D(padding=(1, 1), name=\"pool1_pad\")(x)\n", " x = layers.MaxPooling2D((3, 3), strides=(2, 2))(x)\n", " c1 = x\n", "\n", " x = self._conv_block(x, 3, [64, 64, 256], stage=2, block=\"a\", strides=(1, 1))\n", " for i in range(self.params[0]):\n", " x = self._identity_block(\n", " x, 3, [64, 64, 256], stage=2, block=chr(ord(\"b\") + i)\n", " )\n", " c2 = x\n", "\n", " x = self._conv_block(x, 3, [128, 128, 512], stage=3, block=\"a\")\n", " for i in range(self.params[1]):\n", " x = self._identity_block(\n", " x, 3, [128, 128, 512], stage=3, block=chr(ord(\"b\") + i)\n", " )\n", " c3 = x\n", "\n", " x = self._conv_block(\n", " x, 3, [256, 256, 1024], stage=4, block=\"a\", dilation=dilation[0]\n", " )\n", " for i in range(self.params[2]):\n", " x = self._identity_block(\n", " x,\n", " 3,\n", " [256, 256, 1024],\n", " stage=4,\n", " block=chr(ord(\"b\") + i),\n", " dilation=dilation[0],\n", " )\n", " c4 = x\n", "\n", " x = self._conv_block(\n", " x, 3, [512, 512, 2048], stage=5, block=\"a\", dilation=dilation[1]\n", " )\n", " for i in range(self.params[3]):\n", " x = self._identity_block(\n", " x,\n", " 3,\n", " [512, 512, 2048],\n", " stage=5,\n", " block=chr(ord(\"b\") + i),\n", " dilation=dilation[1],\n", " )\n", " c5 = x\n", "\n", " self.outputs = {\"c1\": c1, \"c2\": c2, \"c3\": c3, \"c4\": c4, \"c5\": c5}\n", "\n", " if type(output_stages) is not list:\n", " return self.outputs[output_stages]\n", " else:\n", " return [self.outputs[ci] for ci in output_stages]\n", "\n", "\n", "class Network(object):\n", " def __init__(\n", " self, num_classes, version=\"PAN\", base_model=\"ResNet50\", dilation=None, **kwargs\n", " ):\n", " super(Network, self).__init__(**kwargs)\n", " if base_model in [\"ResNet50\", \"ResNet101\", \"ResNet152\"]:\n", " self.encoder = ResNet(base_model, dilation=dilation)\n", " else:\n", " raise ValueError(\n", " \"The base model {model} is not in the supported model list!!!\".format(\n", " model=base_model\n", " )\n", " )\n", "\n", " self.num_classes = num_classes\n", " self.version = version\n", " self.base_model = base_model\n", "\n", " def __call__(self, inputs, **kwargs):\n", " return inputs\n", "\n", " def get_version(self):\n", " return self.version\n", "\n", " def get_base_model(self):\n", " return self.base_model" ] }, { "cell_type": "markdown", "id": "d17b3fa7", "metadata": {}, "source": [ "### FCN" ] }, { "cell_type": "markdown", "id": "df340ed3", "metadata": {}, "source": [ ":::{note}\n", "The link for paper is https://arxiv.org/pdf/1411.4038v2.pdf.\n", ":::" ] }, { "cell_type": "markdown", "id": "a411a89a", "metadata": {}, "source": [ "FCN is the first work to train FCNs end-to-end for pixel-wise prediction and from supervised pre-training. Semantic segmentation faces an inherent tension between semantics and location: global information resolves what while local information resolves where. Deep feature hierarchies jointly encode location and semantics in a localto-global pyramid. FCN defines a novel “skip” architecture to combine deep, coarse, semantic information and shallow, fine, appearance information." ] }, { "cell_type": "markdown", "id": "9f3a097c", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/imgseg/03_structure_FCN.png\n", "---\n", "name: '03_display_structure_of_FCN' \n", "width: 90%\n", "---\n", "[ The structure of FCN ](https://arxiv.org/pdf/1411.4038v2.pdf)\n", ":::" ] }, { "cell_type": "markdown", "id": "1a569084", "metadata": {}, "source": [ "#### Code" ] }, { "cell_type": "code", "execution_count": 28, "id": "19b1dec9", "metadata": { "tags": [] }, "outputs": [], "source": [ "import tensorflow as tf\n", "\n", "layers = tf.keras.layers\n", "models = tf.keras.models\n", "backend = tf.keras.backend\n", "\n", "\n", "class FCN(Network):\n", " def __init__(self, num_classes, version=\"FCN-8s\", base_model=\"ResNet50\", **kwargs):\n", " \"\"\"\n", " The initialization of FCN-8s/16s/32s.\n", " :param num_classes: the number of predicted classes.\n", " :param version: 'FCN-8s', 'FCN-16s' or 'FCN-32s'.\n", " :param base_model: the backbone model\n", " :param kwargs: other parameters\n", " \"\"\"\n", " fcn = {\n", " \"FCN-8s\": self._fcn_8s,\n", " \"FCN-16s\": self._fcn_16s,\n", " \"FCN-32s\": self._fcn_32s,\n", " }\n", " base_model = \"ResNet50\" if base_model is None else base_model\n", "\n", " assert version in fcn\n", " self.fcn = fcn[version]\n", " super(FCN, self).__init__(num_classes, version, base_model, **kwargs)\n", "\n", " def __call__(self, inputs=None, input_size=None, **kwargs):\n", " assert inputs is not None or input_size is not None\n", "\n", " if inputs is None:\n", " assert isinstance(input_size, tuple)\n", " inputs = layers.Input(shape=input_size + (3,))\n", " return self.fcn(inputs)\n", "\n", " def _conv_relu(self, x, filters, kernel_size=1):\n", " x = layers.Conv2D(\n", " filters, kernel_size, padding=\"same\", kernel_initializer=\"he_normal\"\n", " )(x)\n", " x = layers.ReLU()(x)\n", " return x\n", "\n", " def _fcn_32s(self, inputs):\n", " num_classes = self.num_classes\n", "\n", " x = self.encoder(inputs)\n", " x = self._conv_relu(x, 4096, 7)\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = self._conv_relu(x, 4096, 1)\n", " x = layers.Dropout(rate=0.5)(x)\n", "\n", " x = layers.Conv2D(num_classes, 1, kernel_initializer=\"he_normal\")(x)\n", " x = layers.Conv2DTranspose(\n", " num_classes, 64, strides=32, padding=\"same\", kernel_initializer=\"he_normal\"\n", " )(x)\n", "\n", " outputs = x\n", " return models.Model(inputs, outputs, name=self.version)\n", "\n", " def _fcn_16s(self, inputs):\n", " num_classes = self.num_classes\n", "\n", " if self.base_model in [\n", " \"DenseNet121\",\n", " \"DenseNet169\",\n", " \"DenseNet201\",\n", " \"DenseNet264\",\n", " \"Xception\",\n", " \"Xception-DeepLab\",\n", " ]:\n", " c4, c5 = self.encoder(inputs, output_stages=[\"c3\", \"c5\"])\n", " else:\n", " c4, c5 = self.encoder(inputs, output_stages=[\"c4\", \"c5\"])\n", "\n", " x = self._conv_relu(c5, 4096, 7)\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = self._conv_relu(x, 4096, 1)\n", " x = layers.Dropout(rate=0.5)(x)\n", "\n", " x = layers.Conv2D(num_classes, 1, kernel_initializer=\"he_normal\")(x)\n", " x = layers.Conv2DTranspose(\n", " num_classes, 4, strides=2, padding=\"same\", kernel_initializer=\"he_normal\"\n", " )(x)\n", " c4 = layers.Conv2D(num_classes, 1, kernel_initializer=\"he_normal\")(c4)\n", " x = layers.Add()([x, c4])\n", "\n", " x = layers.Conv2DTranspose(\n", " num_classes, 32, strides=16, padding=\"same\", kernel_initializer=\"he_normal\"\n", " )(x)\n", "\n", " outputs = x\n", " return models.Model(inputs, outputs, name=self.version)\n", "\n", " def _fcn_8s(self, inputs):\n", " num_classes = self.num_classes\n", "\n", " if self.base_model in [\n", " \"VGG16\",\n", " \"VGG19\",\n", " \"ResNet50\",\n", " \"ResNet101\",\n", " \"ResNet152\",\n", " \"MobileNetV1\",\n", " \"MobileNetV2\",\n", " ]:\n", " c3, c4, c5 = self.encoder(inputs, output_stages=[\"c3\", \"c4\", \"c5\"])\n", " else:\n", " c3, c4, c5 = self.encoder(inputs, output_stages=[\"c2\", \"c3\", \"c5\"])\n", "\n", " x = self._conv_relu(c5, 4096, 7)\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = self._conv_relu(x, 4096, 1)\n", " x = layers.Dropout(rate=0.5)(x)\n", "\n", " x = layers.Conv2D(num_classes, 1, kernel_initializer=\"he_normal\")(x)\n", " x = layers.Conv2DTranspose(\n", " num_classes, 4, strides=2, padding=\"same\", kernel_initializer=\"he_normal\"\n", " )(x)\n", " c4 = layers.Conv2D(num_classes, 1)(c4)\n", " x = layers.Add()([x, c4])\n", "\n", " x = layers.Conv2DTranspose(\n", " num_classes, 4, strides=2, padding=\"same\", kernel_initializer=\"he_normal\"\n", " )(x)\n", " c3 = layers.Conv2D(num_classes, 1)(c3)\n", " x = layers.Add()([x, c3])\n", "\n", " x = layers.Conv2DTranspose(\n", " num_classes, 16, strides=8, padding=\"same\", kernel_initializer=\"he_normal\"\n", " )(x)\n", "\n", " outputs = x\n", " return models.Model(inputs, outputs, name=self.version)" ] }, { "cell_type": "markdown", "id": "4c7a6e7e", "metadata": {}, "source": [ "### SegNet" ] }, { "cell_type": "markdown", "id": "0e699947", "metadata": {}, "source": [ ":::{note}\n", "The link for paper is https://arxiv.org/pdf/1511.00561v3.pdf.\n", ":::" ] }, { "cell_type": "markdown", "id": "347b45f2", "metadata": {}, "source": [ "SegNet is a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. Here is the structure of SegNet:" ] }, { "cell_type": "markdown", "id": "8af284e4", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/imgseg/04_structure_SegNet.png\n", "---\n", "name: '04_display_structure_of_SegNet' \n", "width: 90%\n", "---\n", "[ The structure of SegNet ]https://arxiv.org/pdf/1511.00561v3.pdf )\n", ":::" ] }, { "cell_type": "markdown", "id": "c0daa8b3", "metadata": {}, "source": [ "As we have learnt U-Net, SegNet is similar to it, the main difference is that U-Net does not reuse pooling indices but instead transfers the entire feature map (at the cost of more memory) to the corresponding decoders and concatenates them to upsampled (via deconvolution) decoder feature maps." ] }, { "cell_type": "markdown", "id": "66277c8c", "metadata": {}, "source": [ "#### Code" ] }, { "cell_type": "code", "execution_count": 29, "id": "294880bc", "metadata": { "tags": [] }, "outputs": [], "source": [ "import tensorflow as tf\n", "\n", "layers = tf.keras.layers\n", "models = tf.keras.models\n", "backend = tf.keras.backend\n", "\n", "\n", "class SegNet(Network):\n", " def __init__(self, num_classes, version=\"SegNet\", base_model=\"ResNet50\", **kwargs):\n", " \"\"\"\n", " The initialization of SegNet or Bayesian-SegNet.\n", " :param num_classes: the number of predicted classes.\n", " :param version: 'SegNet' or 'Bayesian-SegNet'.\n", " :param base_model: the backbone model\n", " :param kwargs: other parameters\n", " \"\"\"\n", " base_model = \"ResNet50\" if base_model is None else base_model\n", " assert version in [\"SegNet\", \"Bayesian-SegNet\"]\n", " assert base_model in [\"ResNet50\", \"ResNet101\", \"ResNet152\"]\n", " super(SegNet, self).__init__(num_classes, version, base_model, **kwargs)\n", "\n", " def __call__(self, inputs=None, input_size=None, **kwargs):\n", " assert inputs is not None or input_size is not None\n", "\n", " if inputs is None:\n", " assert isinstance(input_size, tuple)\n", " inputs = layers.Input(shape=input_size + (3,))\n", " return self._segnet(inputs)\n", "\n", " def _conv_bn_relu(self, x, filters, kernel_size=1, strides=1):\n", " x = layers.Conv2D(\n", " filters,\n", " kernel_size,\n", " strides=strides,\n", " padding=\"same\",\n", " kernel_initializer=\"he_normal\",\n", " )(x)\n", " x = layers.BatchNormalization()(x)\n", " x = layers.ReLU()(x)\n", " return x\n", "\n", " def _segnet(self, inputs):\n", " num_classes = self.num_classes\n", " dropout = True if self.version == \"Bayesian-SegNet\" else False\n", "\n", " x = self.encoder(inputs)\n", "\n", " if dropout:\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = layers.UpSampling2D(size=(2, 2))(x)\n", " x = self._conv_bn_relu(x, 512, 3, strides=1)\n", " x = self._conv_bn_relu(x, 512, 3, strides=1)\n", " x = self._conv_bn_relu(x, 512, 3, strides=1)\n", "\n", " if dropout:\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = layers.UpSampling2D(size=(2, 2))(x)\n", " x = self._conv_bn_relu(x, 512, 3, strides=1)\n", " x = self._conv_bn_relu(x, 512, 3, strides=1)\n", " x = self._conv_bn_relu(x, 256, 3, strides=1)\n", "\n", " if dropout:\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = layers.UpSampling2D(size=(2, 2))(x)\n", " x = self._conv_bn_relu(x, 256, 3, strides=1)\n", " x = self._conv_bn_relu(x, 256, 3, strides=1)\n", " x = self._conv_bn_relu(x, 128, 3, strides=1)\n", "\n", " if dropout:\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = layers.UpSampling2D(size=(2, 2))(x)\n", " x = self._conv_bn_relu(x, 128, 3, strides=1)\n", " x = self._conv_bn_relu(x, 64, 3, strides=1)\n", "\n", " if dropout:\n", " x = layers.Dropout(rate=0.5)(x)\n", " x = layers.UpSampling2D(size=(2, 2))(x)\n", " x = self._conv_bn_relu(x, 64, 3, strides=1)\n", " x = layers.Conv2D(num_classes, 1, strides=1, kernel_initializer=\"he_normal\")(x)\n", " x = layers.BatchNormalization()(x)\n", "\n", " outputs = x\n", " return models.Model(inputs, outputs, name=self.version)" ] }, { "cell_type": "markdown", "id": "1e842518", "metadata": {}, "source": [ "### DeepLab V3" ] }, { "cell_type": "markdown", "id": "dfb12271", "metadata": {}, "source": [ "```{note}\n", "The link for paper is https://arxiv.org/pdf/1706.05587v3.pdf.\n", "```" ] }, { "cell_type": "markdown", "id": "a79bd0c2", "metadata": {}, "source": [ "DeepLab V3 revisits applying dilated convolution, which allows us to effectively enlarge the field of view of filters to incorporate multi-scale context, in the framework of both cascaded modules and spatial pyramid pooling. It consists of dilated convolution with various rates and batch normalization layers. DeepLab experiments with laying out the modules in cascade or in parallel." ] }, { "cell_type": "markdown", "id": "5c189e13", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/imgseg/05_cascade_DeepLab_structure.png\n", "---\n", "name: '05_cascade_dilated_conv_structure' \n", "width: 90%\n", "---\n", "[ Cascade dilated convolution for DeepLab ] (https://arxiv.org/pdf/1706.05587v3.pdf)\n", ":::" ] }, { "cell_type": "markdown", "id": "aaefa24d", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/imgseg/06_parallel_DeepLab_structure.png\n", "---\n", "name: '06_parallel_DeepLab_structure' \n", "width: 90%\n", "---\n", "[ Parallel dilated convolution for DeepLab ] (https://arxiv.org/pdf/1706.05587v3.pdf)\n", ":::" ] }, { "cell_type": "markdown", "id": "73367012", "metadata": {}, "source": [ "#### Code" ] }, { "cell_type": "code", "execution_count": 30, "id": "f48bcc14", "metadata": { "tags": [] }, "outputs": [], "source": [ "from tensorflow.keras.layers import Input, Dropout, BatchNormalization, Activation, Add" ] }, { "cell_type": "code", "execution_count": 31, "id": "f1d963f2", "metadata": { "tags": [] }, "outputs": [], "source": [ "import tensorflow as tf\n", "\n", "layers = tf.keras.layers\n", "models = tf.keras.models\n", "backend = tf.keras.backend\n", "\n", "\n", "class DeepLabV3(Network):\n", " def __init__(\n", " self, num_classes, version=\"DeepLabV3\", base_model=\"ResNet50\", **kwargs\n", " ):\n", " \"\"\"\n", " The initialization of DeepLabV3.\n", " :param num_classes: the number of predicted classes.\n", " :param version: 'DeepLabV3'\n", " :param base_model: the backbone model\n", " :param kwargs: other parameters\n", " \"\"\"\n", " dilation = [1, 2]\n", " base_model = \"ResNet50\" if base_model is None else base_model\n", "\n", " assert version == \"DeepLabV3\"\n", " assert base_model in [\"ResNet50\", \"ResNet101\", \"ResNet152\"]\n", " super(DeepLabV3, self).__init__(\n", " num_classes, version, base_model, dilation, **kwargs\n", " )\n", " self.dilation = dilation\n", "\n", " def __call__(self, inputs=None, input_size=None, **kwargs):\n", " assert inputs is not None or input_size is not None\n", "\n", " if inputs is None:\n", " assert isinstance(input_size, tuple)\n", " inputs = layers.Input(shape=input_size + (3,))\n", " return self._deeplabv3(inputs)\n", "\n", " def _deeplabv3(self, inputs):\n", " multi_grid = [1, 2, 4]\n", " num_classes = self.num_classes\n", " dilation = self.dilation\n", "\n", " _, h, w, _ = backend.int_shape(inputs)\n", " self.aspp_size = (h // 16, w // 16)\n", "\n", " x = self.encoder(inputs, output_stages=\"c4\")\n", "\n", " x = self._conv_block(\n", " x, 3, [512, 512, 2048], stage=5, block=\"a\", dilation=dilation[1]\n", " )\n", " for i in range(2):\n", " x = self._identity_block(\n", " x,\n", " 3,\n", " [512, 512, 2048],\n", " stage=5,\n", " block=chr(ord(\"b\") + i),\n", " dilation=dilation[1] * multi_grid[i],\n", " )\n", " x = self._aspp(x, 256)\n", " x = layers.Conv2D(num_classes, 1, strides=1, kernel_initializer=\"he_normal\")(x)\n", " x = layers.UpSampling2D(size=(16, 16), interpolation=\"bilinear\")(x)\n", "\n", " outputs = x\n", " return models.Model(inputs, outputs, name=self.version)\n", "\n", " def _aspp(self, x, out_filters):\n", " xs = list()\n", " x1 = layers.Conv2D(out_filters, 1, strides=1, kernel_initializer=\"he_normal\")(x)\n", " xs.append(x1)\n", "\n", " for i in range(3):\n", " xi = layers.Conv2D(\n", " out_filters, 3, strides=1, padding=\"same\", dilation_rate=6 * (i + 1)\n", " )(x)\n", " xs.append(xi)\n", " img_pool = layers.GlobalAveragePooling2D()(x)\n", " img_pool = layers.Conv2D(out_filters, 1, kernel_initializer=\"he_normal\")(\n", " img_pool\n", " )\n", " img_pool = layers.UpSampling2D(size=self.aspp_size, interpolation=\"bilinear\")(\n", " img_pool\n", " )\n", " xs.append(img_pool)\n", "\n", " x = layers.Concatenate()(xs)\n", " x = layers.Conv2D(out_filters, 1, strides=1, kernel_initializer=\"he_normal\")(x)\n", " x = layers.BatchNormalization()(x)\n", "\n", " return x\n", "\n", " def _identity_block(\n", " self, input_tensor, kernel_size, filters, stage, block, dilation=1\n", " ):\n", " \"\"\"The identity block is the block that has no conv layer at shortcut.\n", " # Arguments\n", " input_tensor: input tensor\n", " kernel_size: default 3, the kernel size of\n", " middle conv layer at main path\n", " filters: list of integers, the filters of 3 conv layer at main path\n", " stage: integer, current stage label, used for generating layer names\n", " block: 'a','b'..., current block label, used for generating layer names\n", " # Returns\n", " Output tensor for the block.\n", " \"\"\"\n", " filters1, filters2, filters3 = filters\n", " if backend.image_data_format() == \"channels_last\":\n", " bn_axis = 3\n", " else:\n", " bn_axis = 1\n", " conv_name_base = \"res\" + str(stage) + block + \"_branch\"\n", " bn_name_base = \"bn\" + str(stage) + block + \"_branch\"\n", "\n", " x = layers.Conv2D(\n", " filters1, (1, 1), kernel_initializer=\"he_normal\", name=conv_name_base + \"2a\"\n", " )(input_tensor)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2a\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters2,\n", " kernel_size,\n", " padding=\"same\",\n", " kernel_initializer=\"he_normal\",\n", " name=conv_name_base + \"2b\",\n", " dilation_rate=dilation,\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2b\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters3, (1, 1), kernel_initializer=\"he_normal\", name=conv_name_base + \"2c\"\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2c\")(x)\n", "\n", " x = layers.add([x, input_tensor])\n", " x = layers.Activation(\"relu\")(x)\n", " return x\n", "\n", " def _conv_block(\n", " self,\n", " input_tensor,\n", " kernel_size,\n", " filters,\n", " stage,\n", " block,\n", " strides=(2, 2),\n", " dilation=1,\n", " ):\n", " \"\"\"A block that has a conv layer at shortcut.\n", " # Arguments\n", " input_tensor: input tensor\n", " kernel_size: default 3, the kernel size of\n", " middle conv layer at main path\n", " filters: list of integers, the filters of 3 conv layer at main path\n", " stage: integer, current stage label, used for generating layer names\n", " block: 'a','b'..., current block label, used for generating layer names\n", " strides: Strides for the first conv layer in the block.\n", " # Returns\n", " Output tensor for the block.\n", " Note that from stage 3,\n", " the first conv layer at main path is with strides=(2, 2)\n", " And the shortcut should have strides=(2, 2) as well\n", " \"\"\"\n", " filters1, filters2, filters3 = filters\n", " if backend.image_data_format() == \"channels_last\":\n", " bn_axis = 3\n", " else:\n", " bn_axis = 1\n", " conv_name_base = \"res\" + str(stage) + block + \"_branch\"\n", " bn_name_base = \"bn\" + str(stage) + block + \"_branch\"\n", "\n", " strides = (1, 1) if dilation > 1 else strides\n", "\n", " x = layers.Conv2D(\n", " filters1,\n", " (1, 1),\n", " strides=strides,\n", " name=conv_name_base + \"2a\",\n", " kernel_initializer=\"he_normal\",\n", " )(input_tensor)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2a\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters2,\n", " kernel_size,\n", " padding=\"same\",\n", " name=conv_name_base + \"2b\",\n", " kernel_initializer=\"he_normal\",\n", " dilation_rate=dilation,\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2b\")(x)\n", " x = layers.Activation(\"relu\")(x)\n", "\n", " x = layers.Conv2D(\n", " filters3, (1, 1), name=conv_name_base + \"2c\", kernel_initializer=\"he_normal\"\n", " )(x)\n", " x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"2c\")(x)\n", "\n", " shortcut = layers.Conv2D(\n", " filters3,\n", " (1, 1),\n", " strides=strides,\n", " name=conv_name_base + \"1\",\n", " kernel_initializer=\"he_normal\",\n", " )(input_tensor)\n", " shortcut = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + \"1\")(\n", " shortcut\n", " )\n", "\n", " x = layers.add([x, shortcut])\n", " x = layers.Activation(\"relu\")(x)\n", " return x" ] }, { "cell_type": "markdown", "id": "52450f86", "metadata": {}, "source": [ "## Your turn! 🚀" ] }, { "cell_type": "markdown", "id": "a0994ed1", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "f3be7f0a", "metadata": {}, "source": [ "## Acknowledgments" ] }, { "cell_type": "markdown", "id": "9c3074e6", "metadata": {}, "source": [ "Thanks to [Yang Lu](https://github.com/luyanger1799) for creating the open-source project [Amazing-Semantic-Segmentation](https://github.com/luyanger1799/Amazing-Semantic-Segmentation), [tensorflow](https://github.com/tensorflow) for creating the open-source course [examples](https://github.com/tensorflow/examples). They inspire the majority of the content in this chapter." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }