-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Questions about constants in code (body_uv_rcnn_heads.py) and transformation "AnnIndex_lowres" to "AnnIndex". #213
Description
Hello!
Thank you for providing code, it gives a chance to fully understand how model works.
I have several questions about constants mentioned in body_uv_rcnn_heads.py file. They have no description or even name, just a number in code (e.g. line 26 number 15).
Questions:
- Line 26 : " model.ConvTranspose(blob_in, 'AnnIndex_lowres'+pref, dim, 15,...". I have a guess that 15 stands for number of annotations classes (14) + 1 (background). It would be nice to make it a config parameter(like BODY_UV_RCNN.NUM_PATCHES), or at least highlight the meaning of this constant in comments in body_uv_rcnn_heads.py
- Line 65 " ### Now reshape UV blobs, such that they are 1x1x(196 *NumSamples)xNUM_PATCHES"
and line 70 " ... , shape=(-1,cfg.BODY_UV_RCNN.NUM_PATCHES+1,196))". In article "Dense Human Pose Estimation In The Wild" it was mentioned that there are <= 14 points per one part of body, and there are 14 semantic parts of body in COCO DensePose Dataset, so i have a guess that it stands for max points * all semantic parts, but i am not sure about this. It would be nice to provide this constant(196) a description.
I also have a question about transformation "AnnIndex_lowres" to "AnnIndex". This transfromation is done via bilinear interpolation and semantically shouldn't change the number of tensor's channel( and for transformations "Index_UV_lowres" to "Index_UV", "U_lowres" to "U_estimated", "V_lowres" to "V_estimated" number of channels is immutable).
But at the same time:
at line 26:
model.ConvTranspose(blob_in, 'AnnIndex_lowres'+pref, dim, 15,cfg.BODY_UV_RCNN.DECONV_KERNEL, pad=int(cfg.BODY_UV_RCNN.DECONV_KERNEL / 2 - 1), stride=2, weight_init=(cfg.BODY_UV_RCNN.CONV_INIT, {'std': 0.001}), bias_init=('ConstantFill', {'value': 0.}))
at line 46:
blob_Ann_Index = model.BilinearInterpolation('AnnIndex_lowres'+pref, 'AnnIndex'+pref, cfg.BODY_UV_RCNN.NUM_PATCHES+1 , cfg.BODY_UV_RCNN.NUM_PATCHES+1, cfg.BODY_UV_RCNN.UP_SCALE)
So, I have questions:
3) in docs of detector.BilinearInterpolation ( detector.py lines 330 -334) mentioned that number of input channels is equal to number of output channels, but at the same time input blob "AnnIndex_lowres" has 15 channels, and output blob "AnnIndex" has 25 channels.How is this possible? I am not familiar with caffe2, but BilinearInterpolation in this project is implemented as ConvTranspose layer with fixed weights.
4) Why number of output channels of "AnnIndex" must be equal to cfg.BODY_UV_RCNN.NUM_PATCHES+1 (in COCO DensePose dataset there are 14 semantic classes for masks)?
I also provide part of log in which this change of channels are highlighted. This log was created by running "python2 tools/train_net.py --cfg configs/DensePose_ResNet50_FPN_single_GPU.yaml OUTPUT_DIR /tmp/detectron-output".
INFO net.py: 241: body_conv_fcn8 : (3, 512, 14, 14) => AnnIndex_lowres : (3, 15, 28, 28) ------- (op: ConvTranspose)
INFO net.py: 241: body_conv_fcn8 : (3, 512, 14, 14) => Index_UV_lowres : (3, 25, 28, 28) ------- (op: ConvTranspose)
INFO net.py: 241: body_conv_fcn8 : (3, 512, 14, 14) => U_lowres : (3, 25, 28, 28) ------- (op: ConvTranspose)
INFO net.py: 241: body_conv_fcn8 : (3, 512, 14, 14) => V_lowres : (3, 25, 28, 28) ------- (op: ConvTranspose)
INFO net.py: 241: AnnIndex_lowres : (3, 15, 28, 28) => AnnIndex : (3, 25, 56, 56) ------- (op: ConvTranspose)
INFO net.py: 241: Index_UV_lowres : (3, 25, 28, 28) => Index_UV : (3, 25, 56, 56) ------- (op: ConvTranspose)
INFO net.py: 241: U_lowres : (3, 25, 28, 28) => U_estimated : (3, 25, 56, 56) ------- (op: ConvTranspose)
INFO net.py: 241: V_lowres : (3, 25, 28, 28) => V_estimated : (3, 25, 56, 56) ------- (op: ConvTranspose)
Thank you for your time and hope to hear from you soon!