Monocular human pose estimation: A survey of deep learning-based methods

Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep learning techniques have been brought significant progress and rema...

Full description

Saved in:
Bibliographic Details
Published inComputer vision and image understanding Vol. 192; p. 102897
Main Authors Chen, Yucheng, Tian, Yingli, He, Mingyi
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.03.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep learning techniques have been brought significant progress and remarkable breakthroughs in the field of human pose estimation. This survey extensively reviews the recent deep learning-based 2D and 3D human pose estimation methods published since 2014. This paper summarizes the challenges, main frameworks, benchmark datasets, evaluation metrics, performance comparison, and discusses some promising future research directions.
AbstractList Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep learning techniques have been brought significant progress and remarkable breakthroughs in the field of human pose estimation. This survey extensively reviews the recent deep learning-based 2D and 3D human pose estimation methods published since 2014. This paper summarizes the challenges, main frameworks, benchmark datasets, evaluation metrics, performance comparison, and discusses some promising future research directions.
ArticleNumber 102897
Author Tian, Yingli
He, Mingyi
Chen, Yucheng
Author_xml – sequence: 1
  givenname: Yucheng
  surname: Chen
  fullname: Chen, Yucheng
  email: chenyucheng@mail.nwpu.edu.cn
  organization: Northwestern Polytechnical University, Xi’an, 710072, China
– sequence: 2
  givenname: Yingli
  surname: Tian
  fullname: Tian, Yingli
  email: ytian@ccny.cuny.edu
  organization: The City College, City University of New York, NY 10031, USA
– sequence: 3
  givenname: Mingyi
  surname: He
  fullname: He, Mingyi
  email: myhe@nwpu.edu.cn
  organization: Northwestern Polytechnical University, Xi’an, 710072, China
BookMark eNp9kM9KAzEQh4NUsK2-gKe8wNb82W52xUsp2goVLwreQjaZ2JRtUpLdQt_eXevJQ2FghoFvmN83QSMfPCB0T8mMElo87Gb66LoZI7TqF6ysxBUaU1KRjPH512iYhcg4zdkNmqS0I4TSvKJjtH4LPuiuURFvu73y-BASYEit26vWBf-IFzh18QgnHCw2AAfcgIre-e-sVgkM3kO7DSbdomurmgR3f32KPl-eP5brbPO-el0uNpnmed5mIIitdF_EKFHXpDYlt1CzEiwRFcx5XtCCFSzXFYg5MVQUXPWBeG2sVTXhU1Se7-oYUopgpXbt76ttVK6RlMjBiNzJwYgcjMizkR5l_9BD7GPG02Xo6QxBH-roIMqkHXgNxkXQrTTBXcJ_ADOhfY0
CitedBy_id crossref_primary_10_1109_JIOT_2024_3484755
crossref_primary_10_1016_j_imavis_2025_105437
crossref_primary_10_1145_3533384
crossref_primary_10_1007_s11760_024_03028_0
crossref_primary_10_3390_s23094465
crossref_primary_10_1515_jisys_2022_0060
crossref_primary_10_3390_s23229259
crossref_primary_10_1109_LGRS_2022_3215729
crossref_primary_10_20965_jaciii_2021_p0432
crossref_primary_10_1109_ACCESS_2020_3010248
crossref_primary_10_32604_cmc_2022_021107
crossref_primary_10_3390_e24010077
crossref_primary_10_1007_s42979_022_01618_8
crossref_primary_10_7717_peerj_12995
crossref_primary_10_3390_s22041615
crossref_primary_10_1016_j_eswa_2021_115498
crossref_primary_10_32604_cmc_2023_035904
crossref_primary_10_1038_s41593_020_00734_z
crossref_primary_10_1109_ACCESS_2024_3423765
crossref_primary_10_1109_JSEN_2021_3107361
crossref_primary_10_1109_THMS_2022_3219242
crossref_primary_10_1016_j_dsp_2022_103628
crossref_primary_10_1155_2021_9940126
crossref_primary_10_3390_s23010092
crossref_primary_10_3390_sports9090118
crossref_primary_10_3390_s21185996
crossref_primary_10_1016_j_media_2022_102435
crossref_primary_10_1016_j_jvcir_2020_102948
crossref_primary_10_1080_0952813X_2021_1938696
crossref_primary_10_3390_s22072481
crossref_primary_10_1016_j_cviu_2021_103275
crossref_primary_10_3390_eng4040155
crossref_primary_10_1016_j_neucom_2023_126388
crossref_primary_10_1109_TRO_2024_3353550
crossref_primary_10_1016_j_cviu_2025_104297
crossref_primary_10_3390_s22228738
crossref_primary_10_1016_j_cag_2023_07_005
crossref_primary_10_1109_ACCESS_2021_3110610
crossref_primary_10_1109_TETCI_2024_3358103
crossref_primary_10_1016_j_jmapro_2024_03_006
crossref_primary_10_3390_children12030310
crossref_primary_10_1007_s11277_024_11116_0
crossref_primary_10_1109_JBHI_2024_3384453
crossref_primary_10_1016_j_eswa_2024_123625
crossref_primary_10_3390_app11094241
crossref_primary_10_1007_s00530_022_01019_0
crossref_primary_10_1007_s11042_020_10103_4
crossref_primary_10_1016_j_rcim_2023_102691
crossref_primary_10_1109_ACCESS_2023_3234421
crossref_primary_10_1109_TNNLS_2023_3314031
crossref_primary_10_3390_agronomy11081500
crossref_primary_10_1109_JIOT_2023_3294955
crossref_primary_10_1016_j_visinf_2021_10_003
crossref_primary_10_1038_s41598_022_13220_2
crossref_primary_10_3390_s21196530
crossref_primary_10_1145_3528223_3530106
crossref_primary_10_1109_ACCESS_2020_3037736
crossref_primary_10_1155_2022_6040371
crossref_primary_10_1016_j_jksuci_2024_102161
crossref_primary_10_1109_TPAMI_2023_3330935
crossref_primary_10_1007_s00521_021_06181_6
crossref_primary_10_2139_ssrn_4108636
crossref_primary_10_1007_s12283_023_00419_3
crossref_primary_10_1364_BOE_489271
crossref_primary_10_1109_TBIOM_2020_3037257
crossref_primary_10_20948_prepr_2021_109
crossref_primary_10_1038_s41598_023_41221_2
crossref_primary_10_1007_s11548_022_02762_5
crossref_primary_10_3390_s25030627
crossref_primary_10_3390_electronics14061078
crossref_primary_10_1016_j_compag_2021_106622
crossref_primary_10_1109_JBHI_2021_3107532
crossref_primary_10_1063_5_0076398
crossref_primary_10_1016_j_neucom_2023_126827
crossref_primary_10_1016_j_bspc_2022_104479
crossref_primary_10_1016_j_displa_2024_102723
crossref_primary_10_1109_ACCESS_2023_3240769
crossref_primary_10_1016_j_jer_2025_01_007
crossref_primary_10_1109_ACCESS_2022_3177623
crossref_primary_10_1007_s11042_021_10866_4
crossref_primary_10_1016_j_cviu_2022_103539
crossref_primary_10_1109_ACCESS_2024_3399222
crossref_primary_10_1155_2021_1333250
crossref_primary_10_1109_ACCESS_2023_3307138
crossref_primary_10_3390_app11041826
crossref_primary_10_3390_jimaging9100204
crossref_primary_10_1007_s11042_023_16225_9
crossref_primary_10_1109_TPAMI_2023_3298850
crossref_primary_10_3390_make5040081
crossref_primary_10_1016_j_dsp_2021_103056
crossref_primary_10_1016_j_eswa_2022_119139
crossref_primary_10_1007_s00371_023_02957_0
crossref_primary_10_1109_TNNLS_2021_3131406
crossref_primary_10_3390_s21248387
crossref_primary_10_3390_s23198061
crossref_primary_10_1007_s11042_023_17923_0
crossref_primary_10_1016_j_measurement_2023_113361
crossref_primary_10_3390_s22020632
crossref_primary_10_1016_j_eswa_2023_122391
crossref_primary_10_3390_s20164414
crossref_primary_10_1007_s11227_021_04184_7
crossref_primary_10_1016_j_cviu_2021_103225
crossref_primary_10_1007_s00530_022_00980_0
crossref_primary_10_1007_s00530_024_01390_0
crossref_primary_10_1186_s12984_022_00998_5
crossref_primary_10_3390_s23104800
crossref_primary_10_1109_ACCESS_2022_3214659
crossref_primary_10_3390_su151813363
crossref_primary_10_1016_j_ymssp_2021_108482
crossref_primary_10_1371_journal_pone_0310831
crossref_primary_10_1007_s11760_022_02271_7
crossref_primary_10_1016_j_jbiomech_2023_111576
crossref_primary_10_3390_app13063614
crossref_primary_10_1155_2022_4437446
crossref_primary_10_3390_s24196187
crossref_primary_10_3390_electronics13071300
crossref_primary_10_1007_s11276_023_03427_0
crossref_primary_10_3390_math11122665
crossref_primary_10_1007_s12555_023_0686_y
crossref_primary_10_3390_s21134550
crossref_primary_10_1007_s10055_024_01044_6
crossref_primary_10_3390_en16031078
crossref_primary_10_1016_j_aei_2022_101717
crossref_primary_10_1007_s11227_021_03684_w
crossref_primary_10_38124_ijisrt_IJISRT24JUN071
crossref_primary_10_3390_app12094165
crossref_primary_10_1007_s10919_023_00450_9
crossref_primary_10_1145_3604279
crossref_primary_10_1016_j_imavis_2021_104198
crossref_primary_10_1016_j_cag_2024_103926
crossref_primary_10_1109_JSEN_2023_3315849
crossref_primary_10_1007_s13735_022_00261_6
crossref_primary_10_1038_s41586_020_2669_y
crossref_primary_10_1145_3524497
crossref_primary_10_1109_ACCESS_2024_3517417
crossref_primary_10_1007_s12369_020_00739_5
crossref_primary_10_3390_s23177312
crossref_primary_10_1016_j_engappai_2022_105813
crossref_primary_10_3233_JIFS_233501
crossref_primary_10_1109_TR_2020_3030952
crossref_primary_10_1016_j_displa_2024_102675
crossref_primary_10_3390_app13042700
crossref_primary_10_3390_sym14020385
crossref_primary_10_1016_j_inffus_2023_102209
crossref_primary_10_1016_j_patcog_2024_111334
crossref_primary_10_1002_cav_2164
crossref_primary_10_1109_TGRS_2022_3162333
crossref_primary_10_3390_a13120331
crossref_primary_10_1145_3460199
crossref_primary_10_1007_s11554_021_01170_3
crossref_primary_10_1016_j_cviu_2024_104051
crossref_primary_10_3389_fbioe_2024_1520831
crossref_primary_10_1016_j_jksuci_2023_101615
crossref_primary_10_1145_3545993
crossref_primary_10_3390_s20236940
crossref_primary_10_3390_s22145419
crossref_primary_10_1016_j_engappai_2021_104260
crossref_primary_10_1007_s10462_022_10174_9
crossref_primary_10_3390_technologies10020047
crossref_primary_10_3390_s24247983
crossref_primary_10_1109_LRA_2023_3296930
crossref_primary_10_1080_21681163_2023_2292067
crossref_primary_10_1145_3715093
crossref_primary_10_3390_s22051729
crossref_primary_10_3390_s22197215
crossref_primary_10_1016_j_engstruct_2024_117736
crossref_primary_10_1109_ACCESS_2020_3025413
crossref_primary_10_3390_mti6070048
crossref_primary_10_3390_s20154257
crossref_primary_10_1109_JSEN_2024_3510728
crossref_primary_10_1109_ACCESS_2022_3183232
crossref_primary_10_3390_forecast3020020
crossref_primary_10_3390_electronics10182267
crossref_primary_10_1007_s10462_022_10142_3
crossref_primary_10_1016_j_artmed_2024_102945
crossref_primary_10_3390_healthcare11040507
crossref_primary_10_3390_buildings14103174
crossref_primary_10_1007_s00500_021_06156_8
crossref_primary_10_1007_s00530_022_00919_5
crossref_primary_10_1109_TNSRE_2021_3138185
crossref_primary_10_1016_j_matpr_2021_04_234
crossref_primary_10_1007_s10846_021_01560_6
crossref_primary_10_32604_cmes_2023_030677
crossref_primary_10_1016_j_neucom_2021_12_007
crossref_primary_10_1007_s10489_022_03623_z
crossref_primary_10_1049_itr2_12314
crossref_primary_10_1016_j_atech_2023_100359
crossref_primary_10_1109_LAWP_2021_3138512
crossref_primary_10_3934_era_2024055
crossref_primary_10_1016_j_neucom_2025_129777
crossref_primary_10_3390_app122010591
crossref_primary_10_3758_s13428_024_02511_3
crossref_primary_10_1016_j_media_2023_102887
crossref_primary_10_3390_s22134846
crossref_primary_10_1016_j_gaitpost_2024_06_007
crossref_primary_10_3390_s22134722
crossref_primary_10_1109_JRFID_2022_3140256
crossref_primary_10_1007_s40747_024_01370_x
crossref_primary_10_1007_s42452_023_05581_8
crossref_primary_10_3389_fresc_2023_1130847
crossref_primary_10_1016_j_displa_2022_102308
Cites_doi 10.1109/CVPR.2017.143
10.1109/CVPR.2017.142
10.1109/TPAMI.2017.2782743
10.1109/TPAMI.2016.2522398
10.1007/s11042-018-5642-0
10.1109/CVPR.2018.00055
10.1016/j.patrec.2013.02.006
10.1109/ICCV.2015.222
10.1007/978-3-030-01264-9_17
10.1109/CVPR.2017.106
10.1109/CVPR.2017.501
10.1016/j.cviu.2006.08.002
10.1109/CVPR.2014.458
10.1007/978-3-030-01249-6_37
10.1109/TMM.2017.2762010
10.1109/CVPR.2012.6248098
10.1016/j.cag.2019.09.002
10.1109/CVPR.2016.335
10.1109/CVPR.2018.00744
10.1109/ICCV.2017.144
10.1006/cviu.2000.0897
10.1109/ICCV.2017.137
10.1109/TSMCC.2004.829274
10.1109/CVPR.2018.00763
10.5244/C.23.3
10.1109/CVPR.2018.00742
10.1109/TSMCC.2009.2027608
10.1109/CVPR.2014.299
10.1109/CVPR.2017.500
10.1109/CVPR.2014.491
10.1109/CVPR.2016.334
10.5244/C.24.12
10.1109/CVPR.2015.7298664
10.1109/JSTSP.2012.2196975
10.1109/ICCV.2017.284
10.1006/cviu.1998.0716
10.1109/CVPR.2016.533
10.1109/CVPR.2011.5995607
10.1109/ICCV.2015.326
10.1109/CVPR.2015.7298976
10.1109/ICCV.2007.4408872
10.1007/978-3-030-01231-1_29
10.1006/cviu.1995.1004
10.1109/ICCV.2017.256
10.5244/C.30.109
10.1109/CVPR.2018.00551
10.1109/ICCV.2013.280
10.1109/CVPR.2019.00465
10.1109/CVPR.2017.610
10.1109/CVPR.2017.139
10.1109/CVPR.2017.170
10.1007/978-3-030-01249-6_46
10.1109/ICCV.2013.396
10.1109/CVPR.2018.00229
10.1109/CVPR.2018.00768
10.1109/ICCV.2017.51
10.1145/3072959.3073596
10.1109/CVPR.2017.591
10.1016/j.patcog.2018.05.029
10.1109/CVPR.2011.5995519
10.1109/TPAMI.2013.248
10.1007/s11263-012-0564-1
10.1109/CVPR.2017.603
10.1109/AVSS.2018.8639378
10.1109/CVPR.2013.471
10.1109/CVPR.2018.00539
10.1109/CVPR.2019.00796
10.1109/CVPR.2016.115
10.3390/s140304189
10.1109/TPAMI.2016.2557779
10.1109/CVPR.2017.492
10.1109/CVPRW.2014.78
10.1109/CVPR.2019.00351
10.1109/CVPR.2008.4587468
10.5244/C.31.15
10.1145/2816795.2818013
10.1249/MSS.0b013e31821ece12
10.1109/ICCV.2017.329
10.1016/j.jvcir.2015.06.013
10.1109/CVPR.2018.00868
10.1007/978-3-030-01219-9_12
10.1109/ICCV.2017.373
10.1007/978-3-030-01234-2_2
10.3390/s16121966
10.23919/APSIPA.2018.8659538
10.1109/ICCV.2017.425
10.1023/B:VISI.0000042934.15159.49
10.1109/CVPR.2018.00237
10.1007/978-3-030-01228-1_42
10.1109/ICCV.2017.288
10.1109/CVPR.2018.00542
10.1007/s11263-009-0275-4
10.1109/CVPR.2018.00762
10.1109/CVPR.2011.5995318
10.1109/ICIP.2018.8451114
10.1109/CVPR.2015.7298965
10.1016/j.cviu.2006.10.016
10.5244/C.31.14
10.1109/CVPR.2013.391
10.1109/TPAMI.2012.241
10.1109/CVPR.2012.6247801
10.1145/2766993
10.1109/CVPR.2019.00120
10.1109/CVPR.2016.308
10.1109/CVPR.2019.01012
10.1109/CVPR.2016.511
10.1007/s11263-009-0273-6
10.1016/j.cviu.2018.04.007
10.1109/CVPR.2014.471
10.1109/CVPR.2018.00880
10.1109/TPAMI.2012.85
10.1109/CVPR.2017.134
10.1109/CVPR.2019.00584
10.1109/CVPR.2019.01225
10.1006/cviu.1998.0744
10.1109/CVPR.2016.512
10.1007/BF02291478
10.1016/j.cviu.2016.09.002
10.1109/CVPR.2017.395
10.1007/978-3-030-01219-9_21
10.1109/CVPR.2016.510
10.1109/CVPR.2017.601
10.1109/CVPR.2018.00546
10.1007/s11263-013-0672-6
10.1109/CVPR.2013.429
10.1109/CVPR.2017.495
10.1007/978-3-030-01225-0_27
10.1109/3DV.2018.00024
10.1109/CVPR.2014.214
10.1109/TPAMI.2012.261
ContentType Journal Article
Copyright 2020 Elsevier Inc.
Copyright_xml – notice: 2020 Elsevier Inc.
DBID AAYXX
CITATION
DOI 10.1016/j.cviu.2019.102897
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Engineering
Computer Science
EISSN 1090-235X
ExternalDocumentID 10_1016_j_cviu_2019_102897
S1077314219301778
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1~.
1~5
29F
4.4
457
4G.
5GY
5VS
6TJ
7-5
71M
8P~
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKC
AAIKJ
AAKOC
AALRI
AAMNW
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HF~
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG5
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSV
SSZ
T5K
TN5
XPP
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
BNPGV
CITATION
SST
ID FETCH-LOGICAL-c344t-e70f9cf9c0da7bb0bd83feb28ef079e5346162624c9e750d1763a0283bdffab03
IEDL.DBID .~1
ISSN 1077-3142
IngestDate Tue Jul 01 04:32:07 EDT 2025
Thu Apr 24 23:01:03 EDT 2025
Fri Feb 23 02:48:40 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Deep learning
Survey
Human pose estimation
41A10
65D05
65D17
41A05
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c344t-e70f9cf9c0da7bb0bd83feb28ef079e5346162624c9e750d1763a0283bdffab03
OpenAccessLink https://doi.org/10.1016/j.cviu.2019.102897
ParticipantIDs crossref_citationtrail_10_1016_j_cviu_2019_102897
crossref_primary_10_1016_j_cviu_2019_102897
elsevier_sciencedirect_doi_10_1016_j_cviu_2019_102897
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate March 2020
2020-03-00
PublicationDateYYYYMMDD 2020-03-01
PublicationDate_xml – month: 03
  year: 2020
  text: March 2020
PublicationDecade 2020
PublicationTitle Computer vision and image understanding
PublicationYear 2020
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J., 2018. End-to-end recovery of human shape and pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131.
Perez-Sala, Escalera, Angulo, Gonzalez (b128) 2014; 14
Everingham, Van Gool, Williams, Winn, Zisserman (b34) 2010; 88
Toshev, A., Szegedy, C., 2014. Deeppose: Human pose estimation via deep neural networks. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660.
Xiao, B., Wu, H., Wei, Y., 2018. Simple baselines for human pose estimation and tracking. In: Proc. European Conference on Computer Vision, pp. 466–481.
Jain, Tompson, LeCun, Bregler (b64) 2014
Popa, A.I., Zanfir, M., Sminchisescu, C., 2017. Deep multitask architecture for integrated 2d and 3d human sensing. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4714–4723.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826.
Li, S., Liu, Z.Q., Chan, A.B., 2014. Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 482–489.
Sapp, B., Taskar, B., 2013. Modec: Multimodal decomposable models for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3681.
Bogo, F., Romero, J., Loper, M., Black, M.J., 2014. FAUST: Dataset and evaluation for 3D mesh registration. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3794–3801.
Tekin, B., Marque. Neila, P., Salzmann, M., Fua, P., 2017. Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 3941–3950.
Ouyang, W., Chu, X., Wang, X., 2014. Multi-source deep learning for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2329–2336.
Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D., 2018. Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2226–2234.
Chen, Wei, Ferryman (b21) 2013; 34
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B., 2012. A database for fine grained activity detection of cooking activities. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1194–1201.
Ferrari, V., Marin-Jimenez, M., Zisserman, A., 2008. Progressive search space reduction for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
Gkioxari, G., Hariharan, B., Girshick, R., Malik, J., 2014b. Using k-poselets for detecting people and localizing their keypoints. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3582–3589.
Ning, Zhang, He (b119) 2018; 20
Tekin, Katircioglu, Salzmann, Lepetit, Fua (b159) 2016
Rogez, G., Weinzaepfel, P., Schmid, C., 2017. Lcr-net: Localization-classification-regression for human pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3433–3441.
Wang, Y., Tran, D., Liao, Z., 2011. Learning hierarchical poselets for human parsing. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1705–1712.
Chou, C.J., Chien, J.T., Chen, H.T., 2018. Self adversarial training for human pose estimation. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 17-30.
Gkioxari, Toshev, Jaitly (b46) 2016
Luvizon, D.C., Picard, D., Tabia, H., 2018. 2d/3d pose estimation and action recognition using multitask deep learning. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146.
Faessler, Mueggler, Schwabe, Scaramuzza (b36) 2014
Ren, He, Girshick, Sun (b138) 2015
Yang, Ramanan (b180) 2013; 35
Elhayek, d. Aguiar, Jain, Thompson, Pishchulin, Andriluka, Bregler, Schiele, Theobalt (b33) 2017; 39
Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K., 2017. Towards accurate multi-person pose estimation in the wild. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911.
Moeslund, Granum (b109) 2001; 81
Vondrick, Patterson, Ramanan (b170) 2013; 101
Debnath, B., O’Brien, M., Yamaguchi, M., Behera, A., 2018. Adapting mobilenets for mobile based upper body pose estimation. In: Proc. IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 1–6.
Lin, Maire, Belongie, Hays, Perona, Ramanan, Dollár (b93) 2014
(b56) 2019
Sapp, B., Weiss, D., Taskar, B., 2011. Parsing human motion with stretchable models. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1281–1288.
Loper, Mahmood, Romero, Pons-Moll, Black (b96) 2015; 34
Arnab, A., Doersch, C., Zisserman, A., 2019. Exploiting temporal context for 3d human pose estimation in the wild. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3395–3404.
Felzenszwalb, Huttenlocher (b39) 2005; 61
Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C., 2017. Learning from synthetic humans. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4627–4635.
Chen, Yuille (b22) 2014
Pfister, Simonyan, Charles, Zisserman (b130) 2014
Holte, Tran, Trivedi, Moeslund (b52) 2012; 6
Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., He, M., 2017b. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. In: Proc. IEEE International Conference on Multimedia and Expo Workshops, pp. 601–604.
Mahmood, Ghorbani, Troje, Pons-Moll, Black (b100) 2019
Iqbal, Gall (b60) 2016
Zhang, W., Zhu, M., Derpanis, K.G., 2013. From actemes to action: A strongly-supervised representation for detailed action understanding. In: Proc. IEEE International Conference on Computer Vision, pp. 2248–2255.
Newell, Huang, Deng (b114) 2017
Pons-Moll, Romero, Mahmood, Black (b132) 2015; 34
Ju, S.X., Black, M.J., Yacoob, Y., 1996. Cardboard people: A parameterized model of articulated image motion. In: Proc. IEEE Conference on Automatic Face and Gesture Recognition, pp. 38–44.
Ainsworth, Haskell, Herrmann, Meckes, Basset Jr., Tudor-Locke, Greer, Vezina, Whitt-Glover, Leon (b2) 2011; 43
Fan, X., Zheng, K., Lin, Y., Wang, S., 2015. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1347-1355.
Andriluka, M., Iqbal, U., Milan, A., Insafutdinov, E., Pishchulin, L., Gall, J., Schiele, B., 2018. Posetrack: A benchmark for human pose estimation and tracking. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5167–5176.
Papandreou, G., Zhu, T., Chen, L.C., Gidaris, S., Tompson, J., Murphy, K., 2018. Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Proc. European Conference on Computer Vision, pp. 269-286.
Luo, Y., Ren, J., Wang, Z., Sun, W., Pan, J., Liu, J., Pang, J., Lin, L., 2018. Lstm pose machines. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5207–5215.
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B., 2016. Deepcut: Joint subset partition and labeling for multi person pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937.
Shahroudy, A., Liu, J., Ng, T.T., Wang, G., 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019.
Jain, Tompson, Andriluka, Taylor, Bregler (b63) 2013
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K., 2017. Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1263–1272.
Insafutdinov, Pishchulin, Andres, Andriluka, Schiele (b58) 2016
Howard, Zhu, Chen, Kalenichenko, Wang, Weyand, Andreetto, Adam (b53) 2017
Eichner, Ferrari (b29) 2010
Wang, Chen, Liu, Qian, Lin, Ma (b171) 2018
Omran, Lassner, Pons-Moll, Gehler, Schiele (b120) 2018
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C., 2015. Efficient object localization using convolutional networks. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656.
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X., 2017. Learning feature pyramids for human pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 1281–1290.
Bourdev, Malik (b11) 2009
Chu, X., Ouyang, W., Li, H., Wang, X., 2016. Structured feature learning for pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4715–4723.
Hasler, Stoll, Sunkel, Rosenhahn, Seidel (b50) 2009
Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K., 2018b. Learning to estimate 3D human pose and shape from a single color image. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 459-468.
Nibali, He, Morgan, Prendergast (b116) 2018
Ionescu, Papava, Olaru, Sminchisescu (b59) 2014; 36
Bogo, F., Romero, J., Pons-Moll, G., Black, M.J., 2017. Dynamic FAUST: Registering human bodies in motion. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6233–6242.
Moeslund, Hilton, Krüger (b110) 2006; 104
Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D., 2018b. Quantized densely connected u-nets for efficient landmark localization. In: Proc. European Conference on Computer Vision, pp. 339–354.
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J., 2013. Towards understanding action recognition. In: Proc. IEEE International Conference on Computer Vision, pp. 3192–3199.
Sidenbladh, H., De la Torre, F., Black, M.J., 2000. A framework for modeling the appearance of 3d articulated figures. In: Proc. IEEE Conferen
10.1016/j.cviu.2019.102897_b168
10.1016/j.cviu.2019.102897_b165
10.1016/j.cviu.2019.102897_b166
10.1016/j.cviu.2019.102897_b167
Wu (10.1016/j.cviu.2019.102897_b175) 2017
10.1016/j.cviu.2019.102897_b28
Sminchisescu (10.1016/j.cviu.2019.102897_b150) 2008
10.1016/j.cviu.2019.102897_b27
Belagiannis (10.1016/j.cviu.2019.102897_b7) 2017
Newell (10.1016/j.cviu.2019.102897_b114) 2017
10.1016/j.cviu.2019.102897_b25
10.1016/j.cviu.2019.102897_b24
10.1016/j.cviu.2019.102897_b23
Loper (10.1016/j.cviu.2019.102897_b96) 2015; 34
10.1016/j.cviu.2019.102897_b32
Moeslund (10.1016/j.cviu.2019.102897_b111) 2011
Mehta (10.1016/j.cviu.2019.102897_b107) 2017; 36
10.1016/j.cviu.2019.102897_b160
10.1016/j.cviu.2019.102897_b162
10.1016/j.cviu.2019.102897_b163
10.1016/j.cviu.2019.102897_b179
Jain (10.1016/j.cviu.2019.102897_b63) 2013
10.1016/j.cviu.2019.102897_b176
10.1016/j.cviu.2019.102897_b177
10.1016/j.cviu.2019.102897_b178
10.1016/j.cviu.2019.102897_b38
10.1016/j.cviu.2019.102897_b37
Ji (10.1016/j.cviu.2019.102897_b67) 2010; 40
Chen (10.1016/j.cviu.2019.102897_b19) 2016
10.1016/j.cviu.2019.102897_b35
Felzenszwalb (10.1016/j.cviu.2019.102897_b39) 2005; 61
10.1016/j.cviu.2019.102897_b43
10.1016/j.cviu.2019.102897_b41
Elhayek (10.1016/j.cviu.2019.102897_b33) 2017; 39
10.1016/j.cviu.2019.102897_b40
Mehta (10.1016/j.cviu.2019.102897_b104) 2017
10.1016/j.cviu.2019.102897_b173
10.1016/j.cviu.2019.102897_b174
Eichner (10.1016/j.cviu.2019.102897_b31) 2012; 34
10.1016/j.cviu.2019.102897_b146
10.1016/j.cviu.2019.102897_b148
(10.1016/j.cviu.2019.102897_b169) 2019
10.1016/j.cviu.2019.102897_b142
10.1016/j.cviu.2019.102897_b143
Shotton (10.1016/j.cviu.2019.102897_b147) 2012; 35
10.1016/j.cviu.2019.102897_b144
Luvizon (10.1016/j.cviu.2019.102897_b99) 2019; 85
10.1016/j.cviu.2019.102897_b49
Ramakrishna (10.1016/j.cviu.2019.102897_b137) 2014
10.1016/j.cviu.2019.102897_b45
10.1016/j.cviu.2019.102897_b55
Li (10.1016/j.cviu.2019.102897_b83) 2018; 83
Cootes (10.1016/j.cviu.2019.102897_b26) 1995; 61
Sigal (10.1016/j.cviu.2019.102897_b149) 2010; 87
Nibali (10.1016/j.cviu.2019.102897_b116) 2018
Gong (10.1016/j.cviu.2019.102897_b47) 2016; 16
Chen (10.1016/j.cviu.2019.102897_b21) 2013; 34
Krizhevsky (10.1016/j.cviu.2019.102897_b78) 2012
Omran (10.1016/j.cviu.2019.102897_b120) 2018
10.1016/j.cviu.2019.102897_b140
10.1016/j.cviu.2019.102897_b141
Howard (10.1016/j.cviu.2019.102897_b53) 2017
Perez-Sala (10.1016/j.cviu.2019.102897_b128) 2014; 14
10.1016/j.cviu.2019.102897_b157
10.1016/j.cviu.2019.102897_b158
10.1016/j.cviu.2019.102897_b153
Wang (10.1016/j.cviu.2019.102897_b172) 2018; 171
10.1016/j.cviu.2019.102897_b154
Pons-Moll (10.1016/j.cviu.2019.102897_b132) 2015; 34
10.1016/j.cviu.2019.102897_b155
10.1016/j.cviu.2019.102897_b156
Lifshitz (10.1016/j.cviu.2019.102897_b91) 2016
(10.1016/j.cviu.2019.102897_b56) 2019
10.1016/j.cviu.2019.102897_b57
Anguelov (10.1016/j.cviu.2019.102897_b5) 2005
10.1016/j.cviu.2019.102897_b66
10.1016/j.cviu.2019.102897_b65
10.1016/j.cviu.2019.102897_b61
Newell (10.1016/j.cviu.2019.102897_b115) 2016
Joo (10.1016/j.cviu.2019.102897_b70) 2017; 41
10.1016/j.cviu.2019.102897_b151
10.1016/j.cviu.2019.102897_b152
Faessler (10.1016/j.cviu.2019.102897_b36) 2014
Liu (10.1016/j.cviu.2019.102897_b94) 2015; 32
Sarafianos (10.1016/j.cviu.2019.102897_b145) 2016; 152
10.1016/j.cviu.2019.102897_b124
10.1016/j.cviu.2019.102897_b125
Jain (10.1016/j.cviu.2019.102897_b64) 2014
10.1016/j.cviu.2019.102897_b126
10.1016/j.cviu.2019.102897_b127
Aggarwal (10.1016/j.cviu.2019.102897_b1) 1999; 73
10.1016/j.cviu.2019.102897_b121
10.1016/j.cviu.2019.102897_b122
10.1016/j.cviu.2019.102897_b123
Moeslund (10.1016/j.cviu.2019.102897_b109) 2001; 81
Tekin (10.1016/j.cviu.2019.102897_b159) 2016
10.1016/j.cviu.2019.102897_b69
10.1016/j.cviu.2019.102897_b129
Eichner (10.1016/j.cviu.2019.102897_b30) 2012
10.1016/j.cviu.2019.102897_b68
10.1016/j.cviu.2019.102897_b77
10.1016/j.cviu.2019.102897_b74
10.1016/j.cviu.2019.102897_b73
(10.1016/j.cviu.2019.102897_b161) 2019
Yang (10.1016/j.cviu.2019.102897_b180) 2013; 35
Gavrila (10.1016/j.cviu.2019.102897_b42) 1999; 73
10.1016/j.cviu.2019.102897_b72
10.1016/j.cviu.2019.102897_b71
Lin (10.1016/j.cviu.2019.102897_b93) 2014
Li (10.1016/j.cviu.2019.102897_b80) 2014
Moeslund (10.1016/j.cviu.2019.102897_b110) 2006; 104
Gower (10.1016/j.cviu.2019.102897_b48) 1975; 40
Hu (10.1016/j.cviu.2019.102897_b54) 2004; 34
Holte (10.1016/j.cviu.2019.102897_b52) 2012; 6
Ning (10.1016/j.cviu.2019.102897_b119) 2018; 20
10.1016/j.cviu.2019.102897_b135
10.1016/j.cviu.2019.102897_b136
10.1016/j.cviu.2019.102897_b131
10.1016/j.cviu.2019.102897_b133
Gkioxari (10.1016/j.cviu.2019.102897_b44) 2014
Tompson (10.1016/j.cviu.2019.102897_b164) 2014
10.1016/j.cviu.2019.102897_b139
Pfister (10.1016/j.cviu.2019.102897_b130) 2014
Vondrick (10.1016/j.cviu.2019.102897_b170) 2013; 101
Bourdev (10.1016/j.cviu.2019.102897_b11) 2009
10.1016/j.cviu.2019.102897_b79
Poppe (10.1016/j.cviu.2019.102897_b134) 2007; 108
10.1016/j.cviu.2019.102897_b88
Bulat (10.1016/j.cviu.2019.102897_b12) 2016
10.1016/j.cviu.2019.102897_b87
Hasler (10.1016/j.cviu.2019.102897_b50) 2009
Ionescu (10.1016/j.cviu.2019.102897_b59) 2014; 36
10.1016/j.cviu.2019.102897_b85
10.1016/j.cviu.2019.102897_b84
10.1016/j.cviu.2019.102897_b82
10.1016/j.cviu.2019.102897_b81
Wang (10.1016/j.cviu.2019.102897_b171) 2018
Everingham (10.1016/j.cviu.2019.102897_b34) 2010; 88
Li (10.1016/j.cviu.2019.102897_b86) 2018; 77
Ainsworth (10.1016/j.cviu.2019.102897_b2) 2011; 43
von Marcard (10.1016/j.cviu.2019.102897_b102) 2016; 38
Zhou (10.1016/j.cviu.2019.102897_b185) 2016
10.1016/j.cviu.2019.102897_b103
10.1016/j.cviu.2019.102897_b186
10.1016/j.cviu.2019.102897_b187
10.1016/j.cviu.2019.102897_b101
(10.1016/j.cviu.2019.102897_b75) 2019
Mahmood (10.1016/j.cviu.2019.102897_b100) 2019
Kocabas (10.1016/j.cviu.2019.102897_b76) 2018
10.1016/j.cviu.2019.102897_b106
Eichner (10.1016/j.cviu.2019.102897_b29) 2010
Jaderberg (10.1016/j.cviu.2019.102897_b62) 2015
10.1016/j.cviu.2019.102897_b89
10.1016/j.cviu.2019.102897_b10
10.1016/j.cviu.2019.102897_b98
10.1016/j.cviu.2019.102897_b97
Mehta (10.1016/j.cviu.2019.102897_b105) 2019
10.1016/j.cviu.2019.102897_b95
10.1016/j.cviu.2019.102897_b92
10.1016/j.cviu.2019.102897_b182
10.1016/j.cviu.2019.102897_b90
10.1016/j.cviu.2019.102897_b183
10.1016/j.cviu.2019.102897_b184
10.1016/j.cviu.2019.102897_b181
Gkioxari (10.1016/j.cviu.2019.102897_b46) 2016
10.1016/j.cviu.2019.102897_b113
Chen (10.1016/j.cviu.2019.102897_b22) 2014
Iqbal (10.1016/j.cviu.2019.102897_b60) 2016
Bogo (10.1016/j.cviu.2019.102897_b8) 2016
10.1016/j.cviu.2019.102897_b112
10.1016/j.cviu.2019.102897_b18
Meredith (10.1016/j.cviu.2019.102897_b108) 2001
10.1016/j.cviu.2019.102897_b17
10.1016/j.cviu.2019.102897_b16
He (10.1016/j.cviu.2019.102897_b51) 2017
10.1016/j.cviu.2019.102897_b117
10.1016/j.cviu.2019.102897_b14
10.1016/j.cviu.2019.102897_b118
10.1016/j.cviu.2019.102897_b13
Charles (10.1016/j.cviu.2019.102897_b15) 2014; 110
Insafutdinov (10.1016/j.cviu.2019.102897_b58) 2016
10.1016/j.cviu.2019.102897_b9
10.1016/j.cviu.2019.102897_b20
Ren (10.1016/j.cviu.2019.102897_b138) 2015
10.1016/j.cviu.2019.102897_b6
10.1016/j.cviu.2019.102897_b4
10.1016/j.cviu.2019.102897_b3
References_xml – reference: Charles, J., Pfister, T., Magee, D., Hogg, D., Zisserman, A., 2016. Personalizing human video pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3063–3072.
– volume: 39
  start-page: 501
  year: 2017
  end-page: 514
  ident: b33
  article-title: Marconi—convnet-based marker-less motion capture in outdoor and indoor scenes
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: Rogez, G., Weinzaepfel, P., Schmid, C., 2017. Lcr-net: Localization-classification-regression for human pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3433–3441.
– reference: Chou, C.J., Chien, J.T., Chen, H.T., 2018. Self adversarial training for human pose estimation. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 17-30.
– volume: 41
  start-page: 190
  year: 2017
  end-page: 204
  ident: b70
  article-title: Panoptic studio: A massively multiview system for social interaction capture
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B., 2014. 2d human pose estimation: New benchmark and state of the art analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693.
– volume: 35
  start-page: 2821
  year: 2012
  end-page: 2840
  ident: b147
  article-title: Efficient human pose estimation from single depth images
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– year: 2016
  ident: b159
  article-title: Structured prediction of 3d human pose with deep neural networks
– start-page: 2277
  year: 2017
  end-page: 2287
  ident: b114
  article-title: Associative embedding: End-to-end learning for joint detection and grouping
  publication-title: Advances in Neural Information Processing Systems
– start-page: 437
  year: 2018
  end-page: 453
  ident: b76
  article-title: Multiposenet: Fast multi-person pose estimation using pose residual network
  publication-title: Proc. European Conference on Computer Vision
– reference: Papandreou, G., Zhu, T., Chen, L.C., Gidaris, S., Tompson, J., Murphy, K., 2018. Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Proc. European Conference on Computer Vision, pp. 269-286.
– start-page: 468
  year: 2017
  end-page: 475
  ident: b7
  article-title: Recurrent human pose estimation
  publication-title: Proc. IEEE Conference on Automatic Face and Gesture Recognition
– reference: Cao, Z., Simon, T., Wei, S.E., Sheikh, Y., 2017. Realtime multi-person 2d pose estimation using part affinity fields. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7299.
– volume: 83
  start-page: 328
  year: 2018
  end-page: 339
  ident: b83
  article-title: Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference
  publication-title: Pattern Recognit.
– volume: 73
  start-page: 82
  year: 1999
  end-page: 98
  ident: b42
  article-title: The visual analysis of human movement: A survey
  publication-title: Comput. Vis. Image Underst.
– year: 2019
  ident: b169
  article-title: Vicon
– reference: Sun, X., Shang, J., Liang, S., Wei, Y., 2017. Compositional human pose regression. In: Proc. IEEE International Conference on Computer Vision, pp. 2602-2611.
– reference: Tome, D., Russell, C., Agapito, L., 2017. Lifting from the deep: Convolutional 3d pose estimation from a single image. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2500-2509.
– reference: Zhao, M., Li, T., Ab. Alsheikh, M., Tian, Y., Zhao, H., Torralba, A., Katabi, D., 2018. Through-wall human pose estimation using radio signals. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7356–7365.
– reference: Li, Z., Dekel, T., Cole, F., Tucker, R., Snavely, N., Liu, C., Freeman, W., 2019. Learning the depths of moving people by watching frozen people. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4521-4530.
– reference: Lin, T.Y., Dollár, R., He, K., Hariharan, B., Belongie, S., 2017. Feature pyramid networks for object detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125.
– start-page: 1799
  year: 2014
  end-page: 1807
  ident: b164
  article-title: Joint training of a convolutional network and a graphical model for human pose estimation
  publication-title: Advances in Neural Information Processing Systems
– volume: 16
  start-page: 1966
  year: 2016
  ident: b47
  article-title: Human pose estimation from monocular images: A comprehensive survey
  publication-title: Sensors
– year: 2017
  ident: b53
  article-title: Mobilenets: Efficient convolutional neural networks for mobile vision applications
– reference: Kreiss, S., Bertoni, L., Alahi, A., 2019. Pifpaf: Composite fields for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 11977–11986.
– reference: Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C., 2017. Learning from synthetic humans. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4627–4635.
– reference: Li, C., Lee, G.H., 2019. Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 9887–9895.
– volume: 36
  start-page: 1325
  year: 2014
  end-page: 1339
  ident: b59
  article-title: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: Sidenbladh, H., De la Torre, F., Black, M.J., 2000. A framework for modeling the appearance of 3d articulated figures. In: Proc. IEEE Conference on Automatic Face and Gesture Recognition, IEEE, pp. 368–375.
– start-page: 302
  year: 2014
  end-page: 315
  ident: b64
  article-title: Modeep: A deep learning framework using motion features for human pose estimation
  publication-title: Proc. Asian Conference on Computer Vision
– start-page: 561
  year: 2016
  end-page: 578
  ident: b8
  article-title: Keep it smpl: Automatic estimation of 3d human pose and shape from a single image
  publication-title: Proc. European Conference on Computer Vision
– reference: Zanfir, A., Marinoiu, E., Sminchisescu, C., 2018. Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157.
– reference: Popa, A.I., Zanfir, M., Sminchisescu, C., 2017. Deep multitask architecture for integrated 2d and 3d human sensing. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4714–4723.
– start-page: 91
  year: 2015
  end-page: 99
  ident: b138
  article-title: Faster r-cnn: Towards real-time object detection with region proposal networks
  publication-title: Advances in Neural Information Processing Systems
– start-page: 228
  year: 2010
  end-page: 242
  ident: b29
  article-title: We are family: Joint pose estimation of multiple persons
  publication-title: Proc. European Conference on Computer Vision
– reference: Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K., 2018b. Learning to estimate 3D human pose and shape from a single color image. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 459-468.
– reference: Yang, W., Li, S., Ouyang, W., Li, H., Wang, X., 2017. Learning feature pyramids for human pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 1281–1290.
– year: 2018
  ident: b171
  article-title: Drpose3d: Depth ranking in 3d human pose estimation
– reference: Eichner, M., Ferrari, V., Zurich, S., 2009. Better appearance models for pictorial structures. In: Proc. British Machine Vision Conference, p. 5.
– year: 2019
  ident: b105
  article-title: Xnect: Real-time multi-person 3d human pose estimation with a single rgb camer
– reference: Debnath, B., O’Brien, M., Yamaguchi, M., Behera, A., 2018. Adapting mobilenets for mobile based upper body pose estimation. In: Proc. IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 1–6.
– reference: Ke, L., Chang, M.C., Qi, H., Lyu, S., 2018. Multi-scale structure-aware network for human pose estimation. In: Proc. European Conference on Computer Vision, pp. 713-728.
– volume: 171
  start-page: 118
  year: 2018
  end-page: 139
  ident: b172
  article-title: Rgb-d-based human motion recognition with deep learning: A survey
  publication-title: Comput. Vis. Image Underst.
– reference: Yang, W., Ouyang, W., Li, H., Wang, X., 2016. End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3073–3082.
– reference: Gkioxari, G., Hariharan, B., Girshick, R., Malik, J., 2014b. Using k-poselets for detecting people and localizing their keypoints. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3582–3589.
– reference: Shahroudy, A., Liu, J., Ng, T.T., Wang, G., 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019.
– reference: Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B., 2017. Arttrack: Articulated multi-person tracking in the wild. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6457–6465.
– volume: 36
  start-page: 44
  year: 2017
  ident: b107
  article-title: Vnect: Real-time 3d human pose estimation with a single rgb camera
  publication-title: ACM Trans. Graph.
– start-page: 2980
  year: 2017
  end-page: 2988
  ident: b51
  article-title: Mask r-cnn
  publication-title: Proc. IEEE International Conference on Computer Vision
– reference: Feng, Z., Xiatian, Z., Mao, Y., 2019. Fast human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
– reference: Zhang, W., Zhu, M., Derpanis, K.G., 2013. From actemes to action: A strongly-supervised representation for detailed action understanding. In: Proc. IEEE International Conference on Computer Vision, pp. 2248–2255.
– reference: Luvizon, D.C., Picard, D., Tabia, H., 2018. 2d/3d pose estimation and action recognition using multitask deep learning. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146.
– reference: Johnson, S., Everingham, M., 2011. Learning effective human pose estimation from inaccurate annotation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1465–1472.
– reference: Nie, B.X., Wei, P., Zhu, S.C., 2017. Monocular 3d human pose estimation by predicting depth on joints. In: Proc. IEEE International Conference on Computer Vision, pp. 3447–3455.
– reference: Tan, J., Budvytis, I., Cipolla, R., 2017. Indirect deep structured learning for 3d human body shape and pose prediction. In: Proc. British Machine Vision Conference.
– reference: Güler, R.A., Neverova, N., Kokkinos, I., 2018. Densepose: Dense human pose estimation in the wild. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306.
– reference: Arnab, A., Doersch, C., Zisserman, A., 2019. Exploiting temporal context for 3d human pose estimation in the wild. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3395–3404.
– volume: 87
  start-page: 4
  year: 2010
  ident: b149
  article-title: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion
  publication-title: Int. J. Comput. Vis.
– reference: Xiao, B., Wu, H., Wei, Y., 2018. Simple baselines for human pose estimation and tracking. In: Proc. European Conference on Computer Vision, pp. 466–481.
– reference: Fang, H., Xie, S., Tai, Y.W., Lu, C., 2017. Rmpe: Regional multi-person pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 2334–2343.
– reference: Tang, W., Yu, P., Wu, Y., 2018a. Deeply learned compositional models for human pose estimation. In: Proc. European Conference on Computer Vision, pp. 190–206.
– year: 2017
  ident: b175
  article-title: Ai challenger: A large-scale dataset for going deeper in image understanding
– start-page: 34
  year: 2016
  end-page: 50
  ident: b58
  article-title: Deepercut: A deeper, stronger, and faster multi-person pose estimation model
  publication-title: Proc. European Conference on Computer Vision
– volume: 152
  start-page: 1
  year: 2016
  end-page: 20
  ident: b145
  article-title: 3d human pose estimation: A review of the literature and analysis of covariates
  publication-title: Comput. Vis. Image Underst.
– reference: Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B., 2016. Deepcut: Joint subset partition and labeling for multi person pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937.
– reference: Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J., 2018. End-to-end recovery of human shape and pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131.
– reference: Chu, X., Ouyang, W., Li, H., Wang, X., 2016. Structured feature learning for pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4715–4723.
– year: 2018
  ident: b116
  article-title: Numerical coordinate regression with convolutional neural networks
– reference: Wang, Y., Tran, D., Liao, Z., 2011. Learning hierarchical poselets for human parsing. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1705–1712.
– start-page: 907
  year: 2014
  end-page: 913
  ident: b36
  article-title: A monocular pose estimation system based on infrared leds
  publication-title: Proc. IEEE International Conference on Robotics and Automation
– reference: Li, B., Shen, C., Dai, Y., Hengel, A., He, M., 2015a. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1119–1127.
– volume: 40
  start-page: 13
  year: 2010
  end-page: 24
  ident: b67
  article-title: Advances in view-invariant human motion analysis: A review
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
– reference: Moreno-Noguer, F., 2017. 3d human pose estimation from a single image via distance matrix regression. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1561–1570.
– reference: Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C., 2015. Efficient object localization using convolutional networks. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656.
– volume: 34
  start-page: 120
  year: 2015
  ident: b132
  article-title: Dyna: A model of dynamic human shape in motion
  publication-title: ACM Trans. Graph.
– start-page: 2017
  year: 2015
  end-page: 2025
  ident: b62
  article-title: Spatial transformer networks
  publication-title: Advances in Neural Information Processing Systems
– year: 2019
  ident: b100
  article-title: Amass: Archive of motion capture as surface shapes
– reference: Ju, S.X., Black, M.J., Yacoob, Y., 1996. Cardboard people: A parameterized model of articulated image motion. In: Proc. IEEE Conference on Automatic Face and Gesture Recognition, pp. 38–44.
– reference: Trumble, M., Gilbert, A., Malleson, C., Hilton, A., Collomosse, J., 2017. Total capture: 3d human pose estimation fusing video and inertial sensors. In: Proc. British Machine Vision Conference, pp. 1–13.
– start-page: 186
  year: 2016
  end-page: 201
  ident: b185
  article-title: Deep kinematic pose regression
  publication-title: Proc. European Conference on Computer Vision
– start-page: 717
  year: 2016
  end-page: 732
  ident: b12
  article-title: Human pose estimation via convolutional part heatmap regression
  publication-title: Proc. European Conference on Computer Vision
– reference: Johnson, S., Everingham, M., 2010. Clustered pose and nonlinear appearance models for human pose estimation. In: Proc. British Machine Vision Conference, p. 5.
– reference: Li, B., Chen, H., Chen, Y., Dai, Y., He, M., 2017a. Skeleton boxes: Solving skeleton based action detection with a single deep convolutional neural network. In: Proc. IEEE International Conference on Multimedia and Expo Workshops, pp. 613–616.
– reference: Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826.
– year: 2013
  ident: b63
  article-title: Learning human pose estimation features with convolutional networks
– volume: 40
  start-page: 33
  year: 1975
  end-page: 51
  ident: b48
  article-title: Generalized procrustes analysis
  publication-title: Psychometrika
– start-page: 185
  year: 2008
  end-page: 211
  ident: b150
  article-title: 3d human motion analysis in monocular video: techniques and challenges
  publication-title: Human Motion
– reference: Qammaz, A., Argyros, A., 2019. Mocapnet: Ensemble of snn encoders for 3d human pose estimation in rgb images. In: Proc. British Machine VIsion Conference.
– start-page: 332
  year: 2014
  end-page: 347
  ident: b80
  article-title: 3d human pose estimation from monocular images with deep convolutional neural network
  publication-title: Proc. Asian Conference on Computer Vision
– start-page: 740
  year: 2014
  end-page: 755
  ident: b93
  article-title: Microsoft coco: Common objects in context
  publication-title: Proc. European Conference on Computer Vision
– reference: Tang, W., Wu, Y., 2019. Does learning specific features for related parts help human pose estimation?. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1107–1116.
– reference: Iqbal, U., Milan, A., Gall, J., 2017. Posetrack: Joint multi-person pose estimation and tracking. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2011-2020.
– reference: Zuffi, S., Freifeld, O., Black, M.J., 2012. From pictorial structures to deformable structures. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3546–3553.
– reference: Pfister, T., Charles, J., Zisserman, A., 2015. Flowing convnets for human pose estimation in videos. In: Proc. IEEE International Conference on Computer Vision, pp. 1913–1921.
– volume: 61
  start-page: 38
  year: 1995
  end-page: 59
  ident: b26
  article-title: Active shape models-their training and application
  publication-title: Comput. Vis. Image Underst.
– reference: Rhodin, H., Salzmann, M., Fua, P., 2018a. Unsupervised geometry-aware representation for 3d human pose estimation. In: Proc. European Conference on Computer Vision, pp. 750-767.
– reference: Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic segmentation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440.
– reference: Zuffi, S., Black, M.J., 2015. The stitched puppet: A graphical model of 3d human shape and pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3537–3546.
– reference: Joo, H., Simon, T., Sheikh, Y., 2018. Total capture: A 3d deformation model for tracking faces, hands, and bodies. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 8320–8329.
– year: 2011
  ident: b111
  article-title: Visual Analysis of Humans
– reference: Rafi, U., Leibe, B., Gall, J., Kostrikov, I., 2016. An efficient convolutional network for human pose estimation. In: Proc. British Machine Vision Conference, p. 2.
– volume: 61
  start-page: 55
  year: 2005
  end-page: 79
  ident: b39
  article-title: Pictorial structures for object recognition
  publication-title: Int. J. Compu. Vis.
– reference: Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., He, M., 2017b. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. In: Proc. IEEE International Conference on Multimedia and Expo Workshops, pp. 601–604.
– reference: Chen, C.H., Ramanan, D., 2017. 3d human pose estimation= 2d pose estimation+ matching. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7035–7043.
– volume: 38
  start-page: 1533
  year: 2016
  end-page: 1547
  ident: b102
  article-title: Human pose estimation from video and imus
  publication-title: IEEE transactions on pattern analysis and machine intelligence
– reference: Nie, X., Feng, J., Xing, J., Yan, S., 2018. Pose partition networks for multi-person pose estimation. In: Proc. European Conference on Computer Vision, pp. 684–699.
– reference: Pavlakos, G., Zhou, X., Daniilidis, K., 2018a. Ordinal depth supervision for 3d human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7307-7316.
– reference: von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G., 2018. Recovering accurate 3d human pose in the wild using imus and a moving camera. In: Proc. European Conference on Computer Vision, pp. 601–617.
– reference: Moon, G., Chang, J.Y., Lee, K.M., 2019. Posefix: Model-agnostic general human pose refinement network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7773–7781.
– reference: Fan, X., Zheng, K., Lin, Y., Wang, S., 2015. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1347-1355.
– start-page: 1365
  year: 2009
  end-page: 1372
  ident: b11
  article-title: Poselets: Body part detectors trained using 3d human pose annotations
  publication-title: Proc. IEEE International Conference on Computer Vision
– reference: Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V., 2017. Unite the people: Closing the loop between 3d and 2d human representations. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4704–4713.
– volume: 34
  start-page: 2282
  year: 2012
  end-page: 2288
  ident: b31
  article-title: Human pose co-estimation and applications
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– year: 2012
  ident: b30
  article-title: Calvin upper-body detector v1.04
– reference: Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J., 2017. Adversarial posenet: A structure-aware convolutional network for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1212-1221.
– volume: 88
  start-page: 303
  year: 2010
  end-page: 338
  ident: b34
  article-title: The pascal visual object classes (voc) challenge
  publication-title: Int. J. Comput. Vis.
– start-page: 246
  year: 2016
  end-page: 260
  ident: b91
  article-title: Human pose estimation using deep consensus voting
  publication-title: Proc. European Conference on Computer Vision
– reference: Tekin, B., Marque. Neila, P., Salzmann, M., Fua, P., 2017. Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 3941–3950.
– start-page: 484
  year: 2018
  end-page: 494
  ident: b120
  article-title: Neural body fitting: Unifying deep learning and model based human pose and shape estimation
  publication-title: Proc. IEEE International Conference on 3D Vision
– reference: Rhodin, H., Spörri, I., Constantin, V., Meyer, F., Müller, E., Salzmann, M., Fua, P., 2018b. Learning monocular 3d human pose estimation from multi-view images. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 8437–8446.
– volume: 6
  start-page: 538
  year: 2012
  end-page: 552
  ident: b52
  article-title: Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments
  publication-title: IEEE J. Sel. Top. Signal Process.
– volume: 101
  start-page: 184
  year: 2013
  end-page: 204
  ident: b170
  article-title: Efficiently scaling up crowdsourced video annotation
  publication-title: Int. J. Comput. Vis.
– reference: Bogo, F., Romero, J., Pons-Moll, G., Black, M.J., 2017. Dynamic FAUST: Registering human bodies in motion. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6233–6242.
– start-page: 627
  year: 2016
  end-page: 642
  ident: b60
  article-title: Multi-person pose estimation with local joint-to-person associations
  publication-title: Proc. European Conference on Computer Vision
– volume: 85
  start-page: 15
  year: 2019
  end-page: 22
  ident: b99
  article-title: Human pose regression by combining indirect part detection and contextual information
  publication-title: Comput. Graph.
– start-page: 1736
  year: 2014
  end-page: 1744
  ident: b22
  article-title: Articulated pose estimation by a graphical model with image dependent pairwise relations
  publication-title: Advances in Neural Information Processing Systems
– reference: Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani, R., Cucchiara, R., 2018. Learning to detect and track visible and occluded body joints in a virtual world. In: Proc. European Conference on Computer Vision, pp. 430–446.
– year: 2019
  ident: b56
  article-title: INRIA4D
– volume: 81
  start-page: 231
  year: 2001
  end-page: 268
  ident: b109
  article-title: A survey of computer vision-based human motion capture
  publication-title: Comput. Vis. Image Underst.
– reference: Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D., 2018. Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2226–2234.
– reference: Sapp, B., Weiss, D., Taskar, B., 2011. Parsing human motion with stretchable models. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1281–1288.
– reference: Ouyang, W., Chu, X., Wang, X., 2014. Multi-source deep learning for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2329–2336.
– reference: Toshev, A., Szegedy, C., 2014. Deeppose: Human pose estimation via deep neural networks. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660.
– reference: Dantone, M., Gall, J., Leistner, C., Va. Gool, L., 2013. Human pose estimation using body parts dependent joint regressors. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3041–3048.
– reference: Jhuang, H., Garrote, H., Poggio, E., Serre, T., Hmdb, T., 2011. A large video database for human motion recognition. In: Proc. IEEE International Conference on Computer Vision, p. 6.
– start-page: 337
  year: 2009
  end-page: 346
  ident: b50
  article-title: A statistical model of human pose and body shape
  publication-title: Computer Graphics Forum
– start-page: 33
  year: 2014
  end-page: 47
  ident: b137
  article-title: Pose machines: Articulated pose estimation via inference machines
  publication-title: Proc. European Conference on Computer Vision
– reference: Ferrari, V., Marin-Jimenez, M., Zisserman, A., 2008. Progressive search space reduction for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
– volume: 20
  start-page: 1246
  year: 2018
  end-page: 1259
  ident: b119
  article-title: Knowledge-guided deep fractal neural networks for human pose estimation
  publication-title: IEEE Trans. Multimed.
– start-page: 479
  year: 2016
  end-page: 488
  ident: b19
  article-title: Synthesizing training images for boosting human 3d pose estimation
  publication-title: Proc. IEEE International Conference on 3D Vision
– reference: Sapp, B., Taskar, B., 2013. Modec: Multimodal decomposable models for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3681.
– volume: 43
  start-page: 1575
  year: 2011
  end-page: 1581
  ident: b2
  article-title: 2011 compendium of physical activities: a second update of codes and met values
  publication-title: Med. Sci. Sports Exerc.
– reference: Li, S., Liu, Z.Q., Chan, A.B., 2014. Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 482–489.
– volume: 77
  start-page: 22901
  year: 2018
  end-page: 22921
  ident: b86
  article-title: 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn
  publication-title: Multimedia Tools Appl.
– start-page: 1097
  year: 2012
  end-page: 1105
  ident: b78
  article-title: Imagenet classification with deep convolutional neural networks
  publication-title: Advances in Neural Information Processing Systems
– volume: 35
  start-page: 2878
  year: 2013
  end-page: 2890
  ident: b180
  article-title: Articulated human detection with flexible mixtures of parts
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D., 2018b. Quantized densely connected u-nets for efficient landmark localization. In: Proc. European Conference on Computer Vision, pp. 339–354.
– reference: Huang, S., Gong, M., Tao, D., 2017. A coarse-fine network for keypoint localization. In: Proc. IEEE International Conference on Computer Vision, pp. 3028–3037.
– reference: Bogo, F., Romero, J., Loper, M., Black, M.J., 2014. FAUST: Dataset and evaluation for 3D mesh registration. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3794–3801.
– reference: Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J., 2013. Articulated pose estimation using discriminative armlet classifiers. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3342–3349.
– reference: Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., Schmid, C., 2018. Bodynet: Volumetric inference of 3d human body shapes. In: Proc. European Conference on Computer Vision, pp. 20-36.
– start-page: 728
  year: 2016
  end-page: 743
  ident: b46
  article-title: Chained predictions using convolutional neural networks
  publication-title: Proc. European Conference on Computer Vision
– volume: 34
  start-page: 248
  year: 2015
  ident: b96
  article-title: Smpl: A skinned multi-person linear model
  publication-title: ACM Trans. Graph.
– volume: 32
  start-page: 10
  year: 2015
  end-page: 19
  ident: b94
  article-title: A survey of human pose estimation: the body parts parsing based methods
  publication-title: J. Vis. Commun. Image Represent.
– reference: Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C., 2018. Single-shot multi-person 3d body pose estimation from monocular rgb input. In: International Conference on 3D Vision, pp. 120-130.
– reference: Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J., 2016. Human pose estimation with iterative error feedback. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733–4742.
– reference: Rohrbach, M., Amin, S., Andriluka, M., Schiele, B., 2012. A database for fine grained activity detection of cooking activities. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1194–1201.
– volume: 34
  start-page: 1995
  year: 2013
  end-page: 2006
  ident: b21
  article-title: A survey of human motion analysis using depth imagery
  publication-title: Pattern Recognit. Lett.
– start-page: 483
  year: 2016
  end-page: 499
  ident: b115
  article-title: Stacked hourglass networks for human pose estimation
  publication-title: Proc. European Conference on Computer Vision
– volume: 73
  start-page: 428
  year: 1999
  end-page: 440
  ident: b1
  article-title: Human motion analysis: A review
  publication-title: Comput. Vis. Image Underst.
– volume: 14
  start-page: 4189
  year: 2014
  end-page: 4210
  ident: b128
  article-title: A survey on model based approaches for 2d and 3d visual human pose recovery
  publication-title: Sensors
– reference: Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J., 2018. Cascaded pyramid network for multi-person pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112.
– reference: Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K., 2017. Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1263–1272.
– reference: Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X., 2017. Multi-context attention for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1831-1840.
– reference: Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X., 2018. 3d human pose estimation in the wild by adversarial learning. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264.
– reference: Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J., 2013. Towards understanding action recognition. In: Proc. IEEE International Conference on Computer Vision, pp. 3192–3199.
– reference: Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K., 2017. Towards accurate multi-person pose estimation in the wild. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911.
– reference: Martinez, J., Hossain, R., Romero, J., Little, J.J., 2017. A simple yet effective baseline for 3d human pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 2640–2649.
– reference: Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y., 2016. Convolutional pose machines. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732.
– volume: 104
  start-page: 90
  year: 2006
  end-page: 126
  ident: b110
  article-title: A survey of advances in vision-based human motion capture and analysis
  publication-title: Comput. Vis. Image Underst.
– year: 2014
  ident: b44
  article-title: R-cnns for pose estimation and action detection
– reference: Li, S., Zhang, W., Chan, A.B., 2015b. Maximum-margin structured learning with deep networks for 3d human pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 2848–2856.
– reference: Sun, K., Xiao, B., Liu, D., Wang, J., 2019. Deep high-resolution representation learning for human pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition.
– reference: Luo, Y., Ren, J., Wang, Z., Sun, W., Pan, J., Liu, J., Pang, J., Lin, L., 2018. Lstm pose machines. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5207–5215.
– start-page: 538
  year: 2014
  end-page: 552
  ident: b130
  article-title: Deep convolutional neural networks for efficient pose estimation in gesture videos
  publication-title: Proc. Asian Conference on Computer Vision
– volume: 110
  start-page: 70
  year: 2014
  end-page: 90
  ident: b15
  article-title: Automatic and efficient human pose estimation for sign language videos
  publication-title: Int. J. Comput. Vis.
– start-page: 408
  year: 2005
  end-page: 416
  ident: b5
  article-title: Scape: shape completion and animation of people
  publication-title: ACM Transactions on Graphics
– year: 2019
  ident: b75
  article-title: Kinect
– reference: Andriluka, M., Iqbal, U., Milan, A., Insafutdinov, E., Pishchulin, L., Gall, J., Schiele, B., 2018. Posetrack: A benchmark for human pose estimation and tracking. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5167–5176.
– start-page: 506
  year: 2017
  end-page: 516
  ident: b104
  article-title: Monocular 3d human pose estimation in the wild using improved cnn supervision
  publication-title: Proc. IEEE International Conference on 3D Vision
– start-page: 241
  year: 2001
  end-page: 244
  ident: b108
  article-title: Motion Capture File Formats Explained, Vol. 211
– volume: 108
  start-page: 4
  year: 2007
  end-page: 18
  ident: b134
  article-title: Vision-based human motion analysis: An overview
  publication-title: Comput. Vis. Image Underst.
– volume: 34
  start-page: 334
  year: 2004
  end-page: 352
  ident: b54
  article-title: A survey on visual surveillance of object motion and behaviors
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
– year: 2019
  ident: b161
  article-title: TheCaptury
– reference: Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y., 2018. Integral human pose regression. In: Proc. European Conference on Computer Vision, pp. 529–545.
– reference: Li, L., Fei-fei, L., 2007. What, where and who? classifying events by scene and object recognition. In: Proc. IEEE International Conference on Computer Vision, p. 6.
– reference: Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y., 2017. Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: Proc. IEEE International Conference on Computer Vision, pp. 398–407.
– ident: 10.1016/j.cviu.2019.102897_b13
  doi: 10.1109/CVPR.2017.143
– ident: 10.1016/j.cviu.2019.102897_b57
  doi: 10.1109/CVPR.2017.142
– start-page: 2277
  year: 2017
  ident: 10.1016/j.cviu.2019.102897_b114
  article-title: Associative embedding: End-to-end learning for joint detection and grouping
– volume: 41
  start-page: 190
  year: 2017
  ident: 10.1016/j.cviu.2019.102897_b70
  article-title: Panoptic studio: A massively multiview system for social interaction capture
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2017.2782743
– volume: 38
  start-page: 1533
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b102
  article-title: Human pose estimation from video and imus
  publication-title: IEEE transactions on pattern analysis and machine intelligence
  doi: 10.1109/TPAMI.2016.2522398
– volume: 77
  start-page: 22901
  year: 2018
  ident: 10.1016/j.cviu.2019.102897_b86
  article-title: 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn
  publication-title: Multimedia Tools Appl.
  doi: 10.1007/s11042-018-5642-0
– ident: 10.1016/j.cviu.2019.102897_b135
– ident: 10.1016/j.cviu.2019.102897_b126
  doi: 10.1109/CVPR.2018.00055
– start-page: 1736
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b22
  article-title: Articulated pose estimation by a graphical model with image dependent pairwise relations
– volume: 34
  start-page: 1995
  year: 2013
  ident: 10.1016/j.cviu.2019.102897_b21
  article-title: A survey of human motion analysis using depth imagery
  publication-title: Pattern Recognit. Lett.
  doi: 10.1016/j.patrec.2013.02.006
– ident: 10.1016/j.cviu.2019.102897_b129
  doi: 10.1109/ICCV.2015.222
– ident: 10.1016/j.cviu.2019.102897_b122
  doi: 10.1007/978-3-030-01264-9_17
– start-page: 246
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b91
  article-title: Human pose estimation using deep consensus voting
– ident: 10.1016/j.cviu.2019.102897_b92
  doi: 10.1109/CVPR.2017.106
– ident: 10.1016/j.cviu.2019.102897_b133
  doi: 10.1109/CVPR.2017.501
– volume: 104
  start-page: 90
  year: 2006
  ident: 10.1016/j.cviu.2019.102897_b110
  article-title: A survey of advances in vision-based human motion capture and analysis
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1016/j.cviu.2006.08.002
– ident: 10.1016/j.cviu.2019.102897_b45
  doi: 10.1109/CVPR.2014.458
– ident: 10.1016/j.cviu.2019.102897_b101
  doi: 10.1007/978-3-030-01249-6_37
– start-page: 483
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b115
  article-title: Stacked hourglass networks for human pose estimation
– volume: 20
  start-page: 1246
  year: 2018
  ident: 10.1016/j.cviu.2019.102897_b119
  article-title: Knowledge-guided deep fractal neural networks for human pose estimation
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2017.2762010
– ident: 10.1016/j.cviu.2019.102897_b187
  doi: 10.1109/CVPR.2012.6248098
– volume: 85
  start-page: 15
  year: 2019
  ident: 10.1016/j.cviu.2019.102897_b99
  article-title: Human pose regression by combining indirect part detection and contextual information
  publication-title: Comput. Graph.
  doi: 10.1016/j.cag.2019.09.002
– ident: 10.1016/j.cviu.2019.102897_b178
  doi: 10.1109/CVPR.2016.335
– ident: 10.1016/j.cviu.2019.102897_b73
  doi: 10.1109/CVPR.2018.00744
– ident: 10.1016/j.cviu.2019.102897_b177
  doi: 10.1109/ICCV.2017.144
– volume: 81
  start-page: 231
  year: 2001
  ident: 10.1016/j.cviu.2019.102897_b109
  article-title: A survey of computer vision-based human motion capture
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1006/cviu.2000.0897
– ident: 10.1016/j.cviu.2019.102897_b18
  doi: 10.1109/ICCV.2017.137
– start-page: 185
  year: 2008
  ident: 10.1016/j.cviu.2019.102897_b150
  article-title: 3d human motion analysis in monocular video: techniques and challenges
– volume: 34
  start-page: 334
  year: 2004
  ident: 10.1016/j.cviu.2019.102897_b54
  article-title: A survey on visual surveillance of object motion and behaviors
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
  doi: 10.1109/TSMCC.2004.829274
– ident: 10.1016/j.cviu.2019.102897_b124
  doi: 10.1109/CVPR.2018.00763
– start-page: 91
  year: 2015
  ident: 10.1016/j.cviu.2019.102897_b138
  article-title: Faster r-cnn: Towards real-time object detection with region proposal networks
– ident: 10.1016/j.cviu.2019.102897_b32
  doi: 10.5244/C.23.3
– ident: 10.1016/j.cviu.2019.102897_b20
  doi: 10.1109/CVPR.2018.00742
– volume: 40
  start-page: 13
  year: 2010
  ident: 10.1016/j.cviu.2019.102897_b67
  article-title: Advances in view-invariant human motion analysis: A review
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
  doi: 10.1109/TSMCC.2009.2027608
– ident: 10.1016/j.cviu.2019.102897_b121
  doi: 10.1109/CVPR.2014.299
– ident: 10.1016/j.cviu.2019.102897_b79
  doi: 10.1109/CVPR.2017.500
– ident: 10.1016/j.cviu.2019.102897_b9
  doi: 10.1109/CVPR.2014.491
– ident: 10.1016/j.cviu.2019.102897_b16
  doi: 10.1109/CVPR.2016.334
– ident: 10.1016/j.cviu.2019.102897_b68
  doi: 10.5244/C.24.12
– year: 2014
  ident: 10.1016/j.cviu.2019.102897_b44
– ident: 10.1016/j.cviu.2019.102897_b163
  doi: 10.1109/CVPR.2015.7298664
– ident: 10.1016/j.cviu.2019.102897_b82
– volume: 6
  start-page: 538
  year: 2012
  ident: 10.1016/j.cviu.2019.102897_b52
  article-title: Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments
  publication-title: IEEE J. Sel. Top. Signal Process.
  doi: 10.1109/JSTSP.2012.2196975
– ident: 10.1016/j.cviu.2019.102897_b153
  doi: 10.1109/ICCV.2017.284
– year: 2011
  ident: 10.1016/j.cviu.2019.102897_b111
– year: 2019
  ident: 10.1016/j.cviu.2019.102897_b75
– start-page: 34
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b58
  article-title: Deepercut: A deeper, stronger, and faster multi-person pose estimation model
– volume: 73
  start-page: 82
  year: 1999
  ident: 10.1016/j.cviu.2019.102897_b42
  article-title: The visual analysis of human movement: A survey
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1006/cviu.1998.0716
– ident: 10.1016/j.cviu.2019.102897_b131
  doi: 10.1109/CVPR.2016.533
– ident: 10.1016/j.cviu.2019.102897_b144
  doi: 10.1109/CVPR.2011.5995607
– ident: 10.1016/j.cviu.2019.102897_b90
  doi: 10.1109/ICCV.2015.326
– ident: 10.1016/j.cviu.2019.102897_b186
  doi: 10.1109/CVPR.2015.7298976
– start-page: 907
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b36
  article-title: A monocular pose estimation system based on infrared leds
– year: 2019
  ident: 10.1016/j.cviu.2019.102897_b169
– ident: 10.1016/j.cviu.2019.102897_b85
  doi: 10.1109/ICCV.2007.4408872
– year: 2018
  ident: 10.1016/j.cviu.2019.102897_b171
– year: 2019
  ident: 10.1016/j.cviu.2019.102897_b105
– ident: 10.1016/j.cviu.2019.102897_b176
  doi: 10.1007/978-3-030-01231-1_29
– start-page: 228
  year: 2010
  ident: 10.1016/j.cviu.2019.102897_b29
  article-title: We are family: Joint pose estimation of multiple persons
– year: 2012
  ident: 10.1016/j.cviu.2019.102897_b30
– volume: 61
  start-page: 38
  year: 1995
  ident: 10.1016/j.cviu.2019.102897_b26
  article-title: Active shape models-their training and application
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1006/cviu.1995.1004
– ident: 10.1016/j.cviu.2019.102897_b38
  doi: 10.1109/ICCV.2017.256
– ident: 10.1016/j.cviu.2019.102897_b136
  doi: 10.5244/C.30.109
– ident: 10.1016/j.cviu.2019.102897_b179
  doi: 10.1109/CVPR.2018.00551
– ident: 10.1016/j.cviu.2019.102897_b182
  doi: 10.1109/ICCV.2013.280
– ident: 10.1016/j.cviu.2019.102897_b84
  doi: 10.1109/CVPR.2019.00465
– ident: 10.1016/j.cviu.2019.102897_b17
  doi: 10.1109/CVPR.2017.610
– start-page: 479
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b19
  article-title: Synthesizing training images for boosting human 3d pose estimation
– ident: 10.1016/j.cviu.2019.102897_b125
  doi: 10.1109/CVPR.2017.139
– ident: 10.1016/j.cviu.2019.102897_b113
  doi: 10.1109/CVPR.2017.170
– start-page: 2017
  year: 2015
  ident: 10.1016/j.cviu.2019.102897_b62
  article-title: Spatial transformer networks
– year: 2019
  ident: 10.1016/j.cviu.2019.102897_b56
– ident: 10.1016/j.cviu.2019.102897_b139
  doi: 10.1007/978-3-030-01249-6_46
– ident: 10.1016/j.cviu.2019.102897_b65
  doi: 10.1109/ICCV.2013.396
– start-page: 1799
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b164
  article-title: Joint training of a convolutional network and a graphical model for human pose estimation
– ident: 10.1016/j.cviu.2019.102897_b181
  doi: 10.1109/CVPR.2018.00229
– ident: 10.1016/j.cviu.2019.102897_b183
  doi: 10.1109/CVPR.2018.00768
– start-page: 332
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b80
  article-title: 3d human pose estimation from monocular images with deep convolutional neural network
– ident: 10.1016/j.cviu.2019.102897_b184
  doi: 10.1109/ICCV.2017.51
– volume: 36
  start-page: 44
  year: 2017
  ident: 10.1016/j.cviu.2019.102897_b107
  article-title: Vnect: Real-time 3d human pose estimation with a single rgb camera
  publication-title: ACM Trans. Graph.
  doi: 10.1145/3072959.3073596
– ident: 10.1016/j.cviu.2019.102897_b10
  doi: 10.1109/CVPR.2017.591
– volume: 83
  start-page: 328
  year: 2018
  ident: 10.1016/j.cviu.2019.102897_b83
  article-title: Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2018.05.029
– ident: 10.1016/j.cviu.2019.102897_b173
  doi: 10.1109/CVPR.2011.5995519
– volume: 36
  start-page: 1325
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b59
  article-title: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2013.248
– ident: 10.1016/j.cviu.2019.102897_b89
– volume: 101
  start-page: 184
  year: 2013
  ident: 10.1016/j.cviu.2019.102897_b170
  article-title: Efficiently scaling up crowdsourced video annotation
  publication-title: Int. J. Comput. Vis.
  doi: 10.1007/s11263-012-0564-1
– ident: 10.1016/j.cviu.2019.102897_b162
  doi: 10.1109/CVPR.2017.603
– ident: 10.1016/j.cviu.2019.102897_b28
  doi: 10.1109/AVSS.2018.8639378
– start-page: 506
  year: 2017
  ident: 10.1016/j.cviu.2019.102897_b104
  article-title: Monocular 3d human pose estimation in the wild using improved cnn supervision
– ident: 10.1016/j.cviu.2019.102897_b148
– ident: 10.1016/j.cviu.2019.102897_b143
  doi: 10.1109/CVPR.2013.471
– ident: 10.1016/j.cviu.2019.102897_b98
  doi: 10.1109/CVPR.2018.00539
– ident: 10.1016/j.cviu.2019.102897_b112
  doi: 10.1109/CVPR.2019.00796
– ident: 10.1016/j.cviu.2019.102897_b146
  doi: 10.1109/CVPR.2016.115
– ident: 10.1016/j.cviu.2019.102897_b37
– volume: 14
  start-page: 4189
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b128
  article-title: A survey on model based approaches for 2d and 3d visual human pose recovery
  publication-title: Sensors
  doi: 10.3390/s140304189
– start-page: 538
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b130
  article-title: Deep convolutional neural networks for efficient pose estimation in gesture videos
– volume: 39
  start-page: 501
  year: 2017
  ident: 10.1016/j.cviu.2019.102897_b33
  article-title: Marconi—convnet-based marker-less motion capture in outdoor and indoor scenes
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2016.2557779
– start-page: 337
  year: 2009
  ident: 10.1016/j.cviu.2019.102897_b50
  article-title: A statistical model of human pose and body shape
– year: 2017
  ident: 10.1016/j.cviu.2019.102897_b53
– ident: 10.1016/j.cviu.2019.102897_b72
– ident: 10.1016/j.cviu.2019.102897_b168
  doi: 10.1109/CVPR.2017.492
– start-page: 728
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b46
  article-title: Chained predictions using convolutional neural networks
– start-page: 437
  year: 2018
  ident: 10.1016/j.cviu.2019.102897_b76
  article-title: Multiposenet: Fast multi-person pose estimation using pose residual network
– ident: 10.1016/j.cviu.2019.102897_b88
  doi: 10.1109/CVPRW.2014.78
– year: 2018
  ident: 10.1016/j.cviu.2019.102897_b116
– ident: 10.1016/j.cviu.2019.102897_b6
  doi: 10.1109/CVPR.2019.00351
– ident: 10.1016/j.cviu.2019.102897_b41
  doi: 10.1109/CVPR.2008.4587468
– ident: 10.1016/j.cviu.2019.102897_b155
  doi: 10.5244/C.31.15
– start-page: 627
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b60
  article-title: Multi-person pose estimation with local joint-to-person associations
– volume: 34
  start-page: 248
  year: 2015
  ident: 10.1016/j.cviu.2019.102897_b96
  article-title: Smpl: A skinned multi-person linear model
  publication-title: ACM Trans. Graph.
  doi: 10.1145/2816795.2818013
– volume: 43
  start-page: 1575
  year: 2011
  ident: 10.1016/j.cviu.2019.102897_b2
  article-title: 2011 compendium of physical activities: a second update of codes and met values
  publication-title: Med. Sci. Sports Exerc.
  doi: 10.1249/MSS.0b013e31821ece12
– year: 2019
  ident: 10.1016/j.cviu.2019.102897_b161
– ident: 10.1016/j.cviu.2019.102897_b55
  doi: 10.1109/ICCV.2017.329
– ident: 10.1016/j.cviu.2019.102897_b66
– ident: 10.1016/j.cviu.2019.102897_b81
– start-page: 740
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b93
  article-title: Microsoft coco: Common objects in context
– volume: 32
  start-page: 10
  year: 2015
  ident: 10.1016/j.cviu.2019.102897_b94
  article-title: A survey of human pose estimation: the body parts parsing based methods
  publication-title: J. Vis. Commun. Image Represent.
  doi: 10.1016/j.jvcir.2015.06.013
– ident: 10.1016/j.cviu.2019.102897_b40
– start-page: 186
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b185
  article-title: Deep kinematic pose regression
– start-page: 468
  year: 2017
  ident: 10.1016/j.cviu.2019.102897_b7
  article-title: Recurrent human pose estimation
– ident: 10.1016/j.cviu.2019.102897_b71
  doi: 10.1109/CVPR.2018.00868
– ident: 10.1016/j.cviu.2019.102897_b158
  doi: 10.1007/978-3-030-01219-9_12
– ident: 10.1016/j.cviu.2019.102897_b151
  doi: 10.1109/ICCV.2017.284
– ident: 10.1016/j.cviu.2019.102897_b118
  doi: 10.1109/ICCV.2017.373
– ident: 10.1016/j.cviu.2019.102897_b167
  doi: 10.1007/978-3-030-01234-2_2
– year: 2017
  ident: 10.1016/j.cviu.2019.102897_b175
– volume: 16
  start-page: 1966
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b47
  article-title: Human pose estimation from monocular images: A comprehensive survey
  publication-title: Sensors
  doi: 10.3390/s16121966
– ident: 10.1016/j.cviu.2019.102897_b23
  doi: 10.23919/APSIPA.2018.8659538
– ident: 10.1016/j.cviu.2019.102897_b160
  doi: 10.1109/ICCV.2017.425
– volume: 61
  start-page: 55
  year: 2005
  ident: 10.1016/j.cviu.2019.102897_b39
  article-title: Pictorial structures for object recognition
  publication-title: Int. J. Compu. Vis.
  doi: 10.1023/B:VISI.0000042934.15159.49
– ident: 10.1016/j.cviu.2019.102897_b127
  doi: 10.1109/CVPR.2018.00237
– start-page: 408
  year: 2005
  ident: 10.1016/j.cviu.2019.102897_b5
  article-title: Scape: shape completion and animation of people
– ident: 10.1016/j.cviu.2019.102897_b117
  doi: 10.1007/978-3-030-01228-1_42
– ident: 10.1016/j.cviu.2019.102897_b103
  doi: 10.1109/ICCV.2017.288
– start-page: 1365
  year: 2009
  ident: 10.1016/j.cviu.2019.102897_b11
  article-title: Poselets: Body part detectors trained using 3d human pose annotations
– ident: 10.1016/j.cviu.2019.102897_b3
  doi: 10.1109/CVPR.2018.00542
– volume: 88
  start-page: 303
  year: 2010
  ident: 10.1016/j.cviu.2019.102897_b34
  article-title: The pascal visual object classes (voc) challenge
  publication-title: Int. J. Comput. Vis.
  doi: 10.1007/s11263-009-0275-4
– ident: 10.1016/j.cviu.2019.102897_b49
  doi: 10.1109/CVPR.2018.00762
– ident: 10.1016/j.cviu.2019.102897_b69
  doi: 10.1109/CVPR.2011.5995318
– ident: 10.1016/j.cviu.2019.102897_b74
  doi: 10.1109/ICIP.2018.8451114
– ident: 10.1016/j.cviu.2019.102897_b95
  doi: 10.1109/CVPR.2015.7298965
– start-page: 561
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b8
  article-title: Keep it smpl: Automatic estimation of 3d human pose and shape from a single image
– volume: 108
  start-page: 4
  year: 2007
  ident: 10.1016/j.cviu.2019.102897_b134
  article-title: Vision-based human motion analysis: An overview
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1016/j.cviu.2006.10.016
– ident: 10.1016/j.cviu.2019.102897_b166
  doi: 10.5244/C.31.14
– ident: 10.1016/j.cviu.2019.102897_b27
  doi: 10.1109/CVPR.2013.391
– volume: 35
  start-page: 2821
  year: 2012
  ident: 10.1016/j.cviu.2019.102897_b147
  article-title: Efficient human pose estimation from single depth images
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2012.241
– ident: 10.1016/j.cviu.2019.102897_b142
  doi: 10.1109/CVPR.2012.6247801
– volume: 34
  start-page: 120
  year: 2015
  ident: 10.1016/j.cviu.2019.102897_b132
  article-title: Dyna: A model of dynamic human shape in motion
  publication-title: ACM Trans. Graph.
  doi: 10.1145/2766993
– start-page: 33
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b137
  article-title: Pose machines: Articulated pose estimation via inference machines
– ident: 10.1016/j.cviu.2019.102897_b157
  doi: 10.1109/CVPR.2019.00120
– ident: 10.1016/j.cviu.2019.102897_b154
  doi: 10.1109/CVPR.2016.308
– ident: 10.1016/j.cviu.2019.102897_b87
  doi: 10.1109/CVPR.2019.01012
– start-page: 484
  year: 2018
  ident: 10.1016/j.cviu.2019.102897_b120
  article-title: Neural body fitting: Unifying deep learning and model based human pose and shape estimation
– ident: 10.1016/j.cviu.2019.102897_b174
  doi: 10.1109/CVPR.2016.511
– volume: 87
  start-page: 4
  year: 2010
  ident: 10.1016/j.cviu.2019.102897_b149
  article-title: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion
  publication-title: Int. J. Comput. Vis.
  doi: 10.1007/s11263-009-0273-6
– volume: 171
  start-page: 118
  year: 2018
  ident: 10.1016/j.cviu.2019.102897_b172
  article-title: Rgb-d-based human motion recognition with deep learning: A survey
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1016/j.cviu.2018.04.007
– start-page: 2980
  year: 2017
  ident: 10.1016/j.cviu.2019.102897_b51
  article-title: Mask r-cnn
– ident: 10.1016/j.cviu.2019.102897_b4
  doi: 10.1109/CVPR.2014.471
– year: 2019
  ident: 10.1016/j.cviu.2019.102897_b100
– start-page: 717
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b12
  article-title: Human pose estimation via convolutional part heatmap regression
– ident: 10.1016/j.cviu.2019.102897_b140
  doi: 10.1109/CVPR.2018.00880
– volume: 34
  start-page: 2282
  year: 2012
  ident: 10.1016/j.cviu.2019.102897_b31
  article-title: Human pose co-estimation and applications
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2012.85
– ident: 10.1016/j.cviu.2019.102897_b141
  doi: 10.1109/CVPR.2017.134
– start-page: 302
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b64
  article-title: Modeep: A deep learning framework using motion features for human pose estimation
– ident: 10.1016/j.cviu.2019.102897_b152
  doi: 10.1109/CVPR.2019.00584
– ident: 10.1016/j.cviu.2019.102897_b77
  doi: 10.1109/CVPR.2019.01225
– volume: 73
  start-page: 428
  year: 1999
  ident: 10.1016/j.cviu.2019.102897_b1
  article-title: Human motion analysis: A review
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1006/cviu.1998.0744
– ident: 10.1016/j.cviu.2019.102897_b14
  doi: 10.1109/CVPR.2016.512
– volume: 40
  start-page: 33
  year: 1975
  ident: 10.1016/j.cviu.2019.102897_b48
  article-title: Generalized procrustes analysis
  publication-title: Psychometrika
  doi: 10.1007/BF02291478
– start-page: 1097
  year: 2012
  ident: 10.1016/j.cviu.2019.102897_b78
  article-title: Imagenet classification with deep convolutional neural networks
– year: 2013
  ident: 10.1016/j.cviu.2019.102897_b63
– volume: 152
  start-page: 1
  year: 2016
  ident: 10.1016/j.cviu.2019.102897_b145
  article-title: 3d human pose estimation: A review of the literature and analysis of covariates
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1016/j.cviu.2016.09.002
– ident: 10.1016/j.cviu.2019.102897_b123
  doi: 10.1109/CVPR.2017.395
– ident: 10.1016/j.cviu.2019.102897_b156
  doi: 10.1007/978-3-030-01219-9_21
– ident: 10.1016/j.cviu.2019.102897_b24
  doi: 10.1109/CVPR.2016.510
– ident: 10.1016/j.cviu.2019.102897_b25
  doi: 10.1109/CVPR.2017.601
– ident: 10.1016/j.cviu.2019.102897_b97
  doi: 10.1109/CVPR.2018.00546
– volume: 110
  start-page: 70
  year: 2014
  ident: 10.1016/j.cviu.2019.102897_b15
  article-title: Automatic and efficient human pose estimation for sign language videos
  publication-title: Int. J. Comput. Vis.
  doi: 10.1007/s11263-013-0672-6
– ident: 10.1016/j.cviu.2019.102897_b43
  doi: 10.1109/CVPR.2013.429
– ident: 10.1016/j.cviu.2019.102897_b61
  doi: 10.1109/CVPR.2017.495
– ident: 10.1016/j.cviu.2019.102897_b35
  doi: 10.1007/978-3-030-01225-0_27
– ident: 10.1016/j.cviu.2019.102897_b106
  doi: 10.1109/3DV.2018.00024
– ident: 10.1016/j.cviu.2019.102897_b165
  doi: 10.1109/CVPR.2014.214
– volume: 35
  start-page: 2878
  year: 2013
  ident: 10.1016/j.cviu.2019.102897_b180
  article-title: Articulated human detection with flexible mixtures of parts
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2012.261
– start-page: 241
  year: 2001
  ident: 10.1016/j.cviu.2019.102897_b108
– year: 2016
  ident: 10.1016/j.cviu.2019.102897_b159
SSID ssj0011491
Score 2.6780767
Snippet Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 102897
SubjectTerms Deep learning
Human pose estimation
Survey
Title Monocular human pose estimation: A survey of deep learning-based methods
URI https://dx.doi.org/10.1016/j.cviu.2019.102897
Volume 192
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEF6KXvTgoyrWR9mDN1m7aTbZxFsplqjYixZ6C9mXVKQNTVvw4m93NtmUCtKDkNOyG8Jk9psZ-GY-hG6MkQrCpCCKxyGBiOeT2GQZ0ZH2wYkAAMtC8WUYJiP2NA7GDdSve2EsrdJhf4XpJVq7lY6zZiefTDqvULhw32Nw5cBJObcNv4xx6-V332uaB6T7pWqe3Uzsbtc4U3G85GqytPSu2E4wiOzgp7-C00bAGRyhA5cp4l71MceooadNdOiyRuzuZAFLtTBDvdZE-xtTBk9QAvd2VtJNcanIh_NZobGdrlG1Ld7jHi6W85X-wjODldY5dloS78QGOYUrmeniFI0GD2_9hDgBBSJ9xhZEc2piCQ9VGReCChX5BkrpSBvKYx34LPSgoOkyGWvIHJQHYJPZhEMoYzJB_TO0M51N9TnC2hOBZJkCOKRM6DDWEZUU3hOJwCjJWsirLZdKN13cilx8pjWN7CO11k6ttdPK2i10uz6TV7M1tu4O6h-S_vKQFMB_y7mLf567RHtdW1uXfLMrtLOYL_U1JCAL0S49rI12e4_PyfAHk87bAA
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qe1APPqpife7Bm4QmzaZJvJViSe3jYgu9hexLKtKGvsB_72yyKRWkByGnJRvCZPabmey33wA8KcUFhklmCT9sWhjxXCtUSWLJQLroRAiAWaE4GDajMX2beJMStIuzMJpWabA_x_QMrc1I3Viznk6n9XcsXHzXobjk0El9PziAilan8spQaXV70XC7mYBFgJNTD_UvOdowZ2dymhffTNea4RVqEYNAaz_9FZ92Yk7nDE5Mskha-fucQ0nOqnBqEkdiluUSh4reDMVYFY53hAYvIMKlO88YpyRrykfS-VISLbCRn1x8IS2yXC828pvMFRFSpsS0k_iwdJwTJO80vbyEced11I4s00PB4i6lK0v6tgo5XrZIfMZsJgJXYTUdSGX7ofRc2nSwpmlQHkpMHoSDeJPonIMJpRJmu1dQns1n8hqIdJjHaSIQEW3KZDOUgc1tfE7APCU4rYFTWC7mRmBc97n4igsm2WesrR1ra8e5tWvwvJ2T5vIae-_2ig8S_3KSGPF_z7ybf857hMNoNOjH_e6wdwtHDV1qZ_SzOyivFmt5j_nIij0Yf_sBH7TdsQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Monocular+human+pose+estimation%3A+A+survey+of+deep+learning-based+methods&rft.jtitle=Computer+vision+and+image+understanding&rft.au=Chen%2C+Yucheng&rft.au=Tian%2C+Yingli&rft.au=He%2C+Mingyi&rft.date=2020-03-01&rft.pub=Elsevier+Inc&rft.issn=1077-3142&rft.eissn=1090-235X&rft.volume=192&rft_id=info:doi/10.1016%2Fj.cviu.2019.102897&rft.externalDocID=S1077314219301778
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1077-3142&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1077-3142&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1077-3142&client=summon