When Is it Okay to Use Data for AI?

Developing AI requires a lot of data and, in many cases, this data comes from third parties. But organizations willing to share data for computational uses have not had easy-to-use licenses for distributing data. Many common licenses, such as the Creative Commons licenses, were developed without consideration for how data could be used for machine learning. The absence of model data sharing agreements has put off many data owners who would otherwise be eager to share their data, thus hindering AI development. To address this problem, Microsoft has published three model data use agreements designed to address if and how data could be used for AI development.

The first model data use agreement, the Open Use of Data Agreement (O-UDA), is similar to the existing Creative Commons No Rights Reserved (CC0) license, as it places no restrictions on the use of data or any outputs—products developed with the use of this data —while clarifying that the creation of an AI model is considered an output, and is thus unrestricted as well.

The second model data use agreement, the Computational Use of Data Agreement (C-UDA), allows a data holder to share data for computational use purposes only. C-UDA, which allows data to be used for machine learning, provides a mechanism to share data for AI development in situations where datasets contain copyrighted works. For example, a local newspaper may want to enable AI developers to use its corpus of articles but not allow them to otherwise use or distribute the content. The C-UDA creates an opportunity for a data holder to share data without giving up its rights to limit access and use for non-computational purposes.

The third model data use agreement, the Data Use Agreement for Open AI Model Development (DUA-OAI), is similar to C-UDA in that it allows for data sharing for computational use only, but with the key distinction that the resulting AI model trained on this data must be made publicly available under an open-source license. DUA-OAI is designed to facilitate data sharing in situations where data holders wish to share data to advance AI development, but only if the outputs of this sharing are available to all, rather than proprietary.

These model agreements are a valuable contribution to the public debate about how to make it easier for organizations to share data for AI development. Federal agencies, such as OMB and NIST, should work with the private sector and other stakeholders including state and local governments to evaluate how to incorporate these types of agreements into future government data releases and standardize the licenses used to release data for AI.

Image: Joi.

Cookie	Duration	Description
__stripe_mid		This cookie is set by Stripe payment gateway. This cookie is used to enable payment on the website without storing any patment information on a server.
__stripe_sid		This cookie is set by Stripe payment gateway. This cookie is used to enable payment on the website without storing any patment information on a server.
_abck		This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
_wpfuuid		This cookie is used by the WPForms WordPress plugin. The cookie is used to allows the paid version of the plugin to connect entries by the same user and is used for some additional features like the Form Abandonment addon.
ASP.NET_SessionId		Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS		This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz		This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checbox-analytics		This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional		The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others		This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary		This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance		This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
elementor		This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
viewed_cookie_policy		The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga		The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_4L7PKQPHHV		This cookie is installed by Google Analytics.
_ga_T50H0MNN9J		This cookie is installed by Google Analytics.
_gat_gtag_UA_12289088_5		Set by Google to distinguish users.
_gcl_au		Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid		Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
_mcid		No description available.
_swa_u		This cookie is set by the provider Sitewit.com. This cookie is used for statistical report and analysis.
ak_bmsc		No description available.
AWSALB		AWSALB is a cookie generated by the Application load balancer in the Amazon Web Services. It works slightly different from AWSELB.
ec_store_chameleon_font		No description available.
FCCDCF		No description available.
issuem_lp		No description available.
lp_us_his		No description
m		No description available.