Towards ethical data management, distribution, and use for artificial intelligence (AI) applications

This poster is part of the OR2020 Virtual Poster Session which takes place in the week of June 1-5. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Maria Esteva, Sharon Strover, Soyoung Park, Christopher Rossbach, John Thywissen

Poster description:

Open datasets are at the core of countless AI applications. However, the complexities involved in large data aggregations, transformations, distribution and reuse, and the limited capacity to validate ethical implications embedded in routine data practices, make it difficult to track and prevent breaches. Recognizing that data and the systems that manage it are not neutral but entangled in a chain of decisions, organizational priorities, technical conditions, and social norms, we investigate how ethical data management can be a point of departure for designing and evaluating AI applications. Our research suggests an array of issues and decision-making instances that touch ethics data management at each lifecycle stage. The findings can inform open repositories’ policies and curation practices towards ethical open data for use in responsible AI. In this poster we describe our research methods using the case of natural hazards data.

(Page through the slides below and click on the full screen window)

About the presenter:

Sharon Strover is a Professor at the School of Journalism, University of Texas at Austin.  She also direct the Technology & Information Policy Center.  Her research and teaching focus on communication technologies and their policy implications.

2 Replies to “Towards ethical data management, distribution, and use for artificial intelligence (AI) applications”

  1. what are some ethical issues you come across during the integration stage and does it vary from location to location you think?

  2. Posting for Maria Esteva:
    Our study operates in the context of artificial intelligence (AI), using data management/curation as a point of departure to improve AI applications. Data quality emerged as an ethical consideration (although this applies I think to most cases of data reuse). Data needs to be balanced, clear, transparent, valid, complete. In general researchers need to integrate many, many datasets for AI applications.,The possibility to integrate data and build sound AI models is influenced by aspects of data quality including completeness, accuracy, validity, interoperability, the use of professional standards in data gathering and curation, the use of sound statistical methods to sample, the data documentation, its publication and accessibility, etc. Basically a mix of domain professional standards (data creation and processing) and FAIR principles (curation/publication). The goal of data creators/producers – to investigate a specific research problem – is not necessary aligned with future uses of data. How can repositories/ data librarians / curators be better brokers and promote curation standards while also speaking and understanding the domain language and context in which data is produced and consumed.?This speaks to our professional code of ethics in terms of bridging data producers possibilities and goals with those of data consumers, and working with both in a less compartmentalized way to put out quality products. Variations may happen due to types of data, provenance, and even the application in which it will be used. Human subjects data, geolocated data, pose more pressing ethical concerns. See this project https://datanutrition.org/ Hope this answered your question.

Comments are closed.