Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data

Aparna Balagopalan Gillian K. Hadfield 1 *, David Madras 2,3,5,6,7 2,3 , David H. Yang , Marzyeh Ghassemi 1,2,3 2,4 , Dylan Hadfield-Menell 1 ,

Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data

Files

thesis.pdf (1.61 MB)

Date

2023

Authors

Aparna Balagopalan Gillian K. Hadfield 1 *, David Madras 2,3,5,6,7 2,3 , David H. Yang , Marzyeh Ghassemi 1,2,3 2,4 , Dylan Hadfield-Menell 1 ,

Abstract

Acknowledgments:Wewouldliketo thank D. Simon(Universityof Southern California School of Law), T. Lyon (University of Southern California School of Law), R. Zemel (University of Toronto), E. Creager (University of Toronto), and five anonymous reviewers for their helpful comments and reviews. We also thank the participating annotators for their responses. Funding: We acknowledge the Schwartz Reisman Institute for Technology and Society for funding this research. A.B. was funded in part byan Amazon Science PhD Fellowship at the MIT Science Hub. D.M. was supported by an NSERC Alexander Graham Bell Canada Graduate Scholarship-Doctoral (CGS-D) during a portion of this research. D.H.-M. was funded in part by a gift from the Hirji Wigglesworth Family Foundation and in part by the Bonnie and Marty (1864) Tenenbaum Career Development Chair. G.K.H was funded in part by the Schwartz Reisman Chair in Technology and Society and a CIFAR AI Chair at the Vector Institute. M.G. was funded in part by the Hermann L. F. von Helmholtz Career Development Professorship at MIT, Microsoft Research, a CIFAR AI Chair at the Vector Institute, a CIFAR Azrieli Global Scholar award, and a Canada Research Council Chair. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute. This project was approved by the University of Toronto’s Institutional Research Ethics Board (protocol no. 00037283). Author contributions: The research was conceived by D.H.-M. and G.K.H.; the study was designed by D.H.-M., G.K.H., D.M., andM.G.;datacollection wasdonebyA.B.andD.H.Y.;datainterpretationandanalysis wasdone by A.B., D.M., D.H.Y., D.H.-M., G.K.H., and M.G.; and the manuscript was prepared by A.B., D.M., D.H.-M., G.K.H., and M.G. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Code to reproduce the paper’s main findings can be found at https://doi.org/10.5281/zenodo.7782689.

URI

https://demo.dspace.org/handle/10673/1422

Collections

Artical

Full item page

Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections