深度学习图片预处理：crop

2025-10-08 08:29:23 世界杯足球场

内容

1.前言

2.带注释代码

3.原理分析

前言

在深度学习中，经常需要将输入图片进行裁剪，转换为网络输入的图片height和width。如果直接将图片进行resize，可能使得图片中的bbox变形。因此需要对原图进行扩展之后，以原图的bbox中心作为输出图的中心，按照[height，width]

\times

\frac{crop\_size}{min\_shape}

min_shapecrop_size的大小进行裁剪，然后再resize到[height，width]进行输出。这样裁剪的好处是可以使得bbox不会变形。

带注释代码

原图(img)，扩展后的图片(bimg)，裁剪后的图片(crop_img)效果如下：

import cv2

import imageio

import matplotlib.pyplot as plt

import matplotlib.patches as patches

import numpy as np

is_train = True

height, width = 256, 192

# 横 x 竖 y

bbox = [111, 3, 390, 295] # x1,y1,x2,y2

# 1.ori img

img = imageio.imread('test.jpg')

add = max(img.shape[0], img.shape[1]) # 找出输入图片的最长边，等下在四周进行add大小的填充

mean_value = [122.7717, 115.9465, 102.9801]

# 2.border img

bimg = cv2.copyMakeBorder(img,

add, add, add, add,

borderType=cv2.BORDER_CONSTANT, # constant pixel_mean as border

value=mean_value)

#四周填充上add后的图像为bimg

bbox = np.array(bbox).reshape(4, ).astype(np.float32)

# bbox contains obj

objcenter = np.array([(bbox[0] + bbox[2]) / 2., # bbox_w/2

(bbox[1] + bbox[3]) / 2.]) # bbox_h/2

#找出bbox的中心点的坐标

# 此处将bbox,keypoints等调整为bimg中的坐标

bbox += add # bbox由原图调整为bimg中的坐标

objcenter += add #bbox中心由原图调整为bimg中的坐标

# 3.extend and crop img

bbox_extend_factor = (0.1, 0.15)

# bbox [w,h] * (1 + extend_factor), [0.1, 0.15]

crop_width = (bbox[2] - bbox[0]) * (1 + bbox_extend_factor[0] * 2) # 两边各扩展0.1倍bbox_extend_factor

crop_height = (bbox[3] - bbox[1]) * (1 + bbox_extend_factor[1] * 2)

if is_train: #训练情况下再进行一定扩展

crop_width = crop_width * (1 + 0.25)

crop_height = crop_height * (1 + 0.25)

print('image_wh:', img.shape[1], img.shape[0]) # width=533,height=330

print('input_wh:', width, height) # 192,256

print()

print('ori_bbox_wh:', bbox[2] - bbox[0], bbox[3] - bbox[1]) # 279.0,292.0

print('crop_box_wh:', crop_width, crop_height) # 418.5,474.5

print('crop/input:', crop_width / width, crop_height / height) # 2.18,1.85

print()

# crop_size 取比例较大的边，min_shape取边对应的输出边

if crop_height / height > crop_width / width:

crop_size = crop_height

min_shape = height

else:

crop_size = crop_width

min_shape = width

print('crop size:', crop_size) # 418.5

print('min shape:', min_shape) # 192

print()

print('after extend')

print('objcenter:', objcenter) # 783.5 682

print('crop bbox:', bbox) # [389. 387. 637. 558.]

print('bimg_wh:', bimg.shape[1], bimg.shape[0]) # 1599 1366

print()

# min_shape = height or width of input

# crop_size 与 obj 左右上下相比较

crop_size = min(crop_size, objcenter[0] / width * min_shape * 2. - 1.) # if width=min_shape, objcenter[0]*2-1

crop_size = min(crop_size, (bimg.shape[1] - objcenter[0]) / width * min_shape * 2. - 1)

crop_size = min(crop_size, objcenter[1] / height * min_shape * 2. - 1.)

crop_size = min(crop_size, (bimg.shape[0] - objcenter[1]) / height * min_shape * 2. - 1)

# 以 crop_size 为基准，基于 objcenter 在 bimg 上获得左上,右下点

# 保证图像宽高比 = model input 宽高比，所以 x,y_ratio 是相等的

min_x = int(objcenter[0] - crop_size / 2. / min_shape * width)

max_x = int(objcenter[0] + crop_size / 2. / min_shape * width)

min_y = int(objcenter[1] - crop_size / 2. / min_shape * height)

max_y = int(objcenter[1] + crop_size / 2. / min_shape * height)

x_ratio = float(width) / (max_x - min_x)

y_ratio = float(height) / (max_y - min_y)

print('ratios:', x_ratio, y_ratio)

crop_img = cv2.resize(bimg[min_y:max_y, min_x:max_x, :], (width, height))

#显示图片

plt.subplot(2,2,1)

plt.imshow(img)

plt.gca().add_patch(

patches.Rectangle(xy=(bbox[0], bbox[1]), # bottom, left

width=bbox[2] - bbox[0], height=bbox[3] - bbox[1],

linewidth=1, edgecolor='r', facecolor='none'))

plt.subplot(2,2,2)

plt.imshow(bimg)

plt.gca().add_patch(

patches.Rectangle(xy=(bbox[0], bbox[1]), # bottom, left

width=bbox[2] - bbox[0], height=bbox[3] - bbox[1],

linewidth=1, edgecolor='r', facecolor='none'))

plt.subplot(2,2,3)

plt.imshow(crop_img)

plt.show()

原理分析

1.首先为了确保最后resize的时候bbox能全部裁剪进去，所以要先对bbox进行适当扩展。

2.由于最终resize之后的[height，width]确定，所以需要确定的就是放缩的系数

\frac{crop\_size}{min\_shape}

min_shapecrop_size。

先来理解这一段代码。

# crop_size 取比例较大的边，min_shape取边对应的输出边

if crop_height / height > crop_width / width:

crop_size = crop_height

min_shape = height

else:

crop_size = crop_width

min_shape = width

可以看到相当于是粗略的选择了一下放缩系数

\frac{crop\_size}{min\_shape}

min_shapecrop_size。那么为什么要选择比例较大的边呢，我个人认为可以这么理解，假设一个情况。假若crop_height很大，crop_width很小，而height很小，width很大，那么如果我们选择比例较小的边，那就是crop_width / width作为放缩系数

\frac{crop\_size}{min\_shape}

min_shapecrop_size，根据假设可以得出这个放缩系数小于1。

由下面代码可以看出，在bimg上裁剪的时候，是以[height，width]

\times

\frac{crop\_size}{min\_shape}

min_shapecrop_size的大小进行裁剪的，如果放缩系数小于1，height本身很小，crop_height很大，就会使得height

\times

\frac{crop\_size}{min\_shape}

min_shapecrop_size比crop_height小很多。这样将导致裁剪的时候bbox会丢失很多在height方向上的图片信息。

min_x = int(objcenter[0] - crop_size / 2. / min_shape * width)

max_x = int(objcenter[0] + crop_size / 2. / min_shape * width)

min_y = int(objcenter[1] - crop_size / 2. / min_shape * height)

max_y = int(objcenter[1] + crop_size / 2. / min_shape * height)

因此粗略的确定

\frac{crop\_size}{min\_shape}

min_shapecrop_size这个系数需要选择比例较大的边。

3.后面部分就是关于

\frac{crop\_size}{min\_shape}

min_shapecrop_size的一个调整了。

关于调整主要需要关注三个方面的内容，第一个，

\frac{crop\_size}{min\_shape}

min_shapecrop_size需要保证区域大小完全在bimg之内，这个很显然，如果超出了bimg区域，怎么来裁剪呢；第二个

\frac{crop\_size}{min\_shape}

min_shapecrop_size需要保证完全包括了bbox的所有区域；第三个，

\frac{crop\_size}{min\_shape}

min_shapecrop_size需要尽量小，因为在完全包括bbox的前提下，放缩系数越大，resize之后的bbox区域就会越模糊。

由于是同左右上下进行比较，所以我只取同左边进行比较相关的代码，其余可以以此类推。下面一个一个来看代码是怎么来满足这三个方面的。

crop_size = min(crop_size, objcenter[0] / width * min_shape * 2. - 1.)

min_x = int(objcenter[0] - crop_size / 2. / min_shape * width)

第一个：

假如crop_size0，满足第一个要求。

假如crop_size>objcenter[0] / width * min_shape * 2. - 1，那么crop_size=objcenter[0] / width * min_shape * 2. - 1，所以（crop_size+1） / 2. / min_shape * width=objcenter[0]，所以还是有crop_size / 2. / min_shape * width0，满足了第一个要求。

第二个：

设裁剪区域[H,W]=[height，width]

\times

\frac{crop\_size}{min\_shape}

min_shapecrop_size。

当min(crop_size, objcenter[0] / width * min_shape * 2. - 1.)=crop_size时，

不失一般性，假设min_shape=height，即crop_height / height > crop_width / width时。此时H=crop_size，满足H>bbox_height，所以在height上满足第二个要求，同时有W=width

\times

\frac{crop\_size}{min\_shape}

min_shapecrop_size>width

\times

\frac{crop\_width}{width}

widthcrop_width=crop_width，所以在width上满足第二个要求。

当min(crop_size, objcenter[0] / width * min_shape * 2. - 1.)=objcenter[0] / width * min_shape * 2. - 1时，

有W=width

\times

\frac{crop\_size}{min\_shape}

min_shapecrop_size=width

\times

（

[

]

∗

−

）

（objcenter[0] * 2/ width-\frac{1}{min\_shape}）

（objcenter[0]∗2/width−min_shape1）。

现在比较

（

[

]

∗

−

）

（objcenter[0] * 2/ width-\frac{1}{min\_shape}）

（objcenter[0]∗2/width−min_shape1）和

\frac{crop\_width}{width}

widthcrop_width的大小关系。经过变换，等于比较

（

[

]

∗

−

）

（objcenter[0] * 2-\frac{width}{min\_shape}）

（objcenter[0]∗2−min_shapewidth）和crop_width的关系。因为objcenter[0] * 2等价于bimg中bbox中心以左部分长度的两倍。add = max(img.shape[0], img.shape[1])，add>=bbox_width，而可以从下面的代码可以看出crop_widthcrop_width

bbox_extend_factor = (0.1, 0.15)

crop_width = (bbox[2] - bbox[0]) * (1 + bbox_extend_factor[0] * 2)

而

\frac{width}{min\_shape}

min_shapewidth是一个很小的量，所以可以不太严谨地证明出

（

[

]

∗

−

）

（objcenter[0] * 2-\frac{width}{min\_shape}）

（objcenter[0]∗2−min_shapewidth）大于crop_width。所以有W=width

\times

（

[

]

∗

−

）

（objcenter[0] * 2/ width-\frac{1}{min\_shape}）

（objcenter[0]∗2/width−min_shape1）>width

\times

\frac{crop\_width}{width}

widthcrop_width=crop_width，所以在width上满足第二个要求。

第三个：

由前面的证明得到了crop_size及左右上下的部分，都满足第一个和第二个要求。那么下面这一段代码其实就是在寻找满足第三个要求的过程，即找到尽量小的

\frac{crop\_size}{min\_shape}

min_shapecrop_size。

crop_size = min(crop_size, objcenter[0] / width * min_shape * 2. - 1.)

crop_size = min(crop_size, (bimg.shape[1] - objcenter[0]) / width * min_shape * 2. - 1)

crop_size = min(crop_size, objcenter[1] / height * min_shape * 2. - 1.)

crop_size = min(crop_size, (bimg.shape[0] - objcenter[1]) / height * min_shape * 2. - 1)

到这里为止，我们对代码进行了分析，理解了代码为什么这么写的原理，数据预处理里面crop的部分算是大功告成啦（撒花）。

以上都是个人的理解和推导过程，转载请注明出处，要是有错误或考虑不周的地方欢迎指出。

征途手游转生全攻略快速掌握转生方法与技巧
为什么借钱的人，总是不愿意还钱，这是我见过最好的答案

深度学习图片预处理：crop

最新发表

友情链接