Open Source Large Language Models: an Ethical Reflection
Author: Isis Hazewindus
Since the introduction of GPT-3.5 to the greater public by means of the ChatGPT chatbot, Large Language Models (LLM’s) have taken flight in people’s daily lives. As these things go, aside from ChatGPT, which is owned by OpenAI, other similar models popped up. Some of these, such as Bard and PaLM from Google or Claude from Anthropic are proprietary models like ChatGPT, which means they are owned by the company that made them, making technical specs like the source code, training data and Reinforcement Learning from Human Feedback (RLHF) data, walled off from outside scrutiny. Other models are open source models, some of which are trained on data from proprietary models made public, such as Meta’s LlAMa model, which made its specs public for researchers after a data-breach on 4Chan in March, resulting in the Stanford Alpaca model, amongst others. Hugging Face, a hosting platform, is currently the place to be to find any open source LLM out there. LLM leaderboards, which track, rank and evaluate LLM models on qualities such as performance benchmarks and multitask language understanding can be found on Hugging Face as well as on LMSYS.org.
The open source developments on LLM’s are followed with curiosity, as open source software generally seems to have a more positive image than large corporate software shrouded in mystery. As seen in our July 2023 blogpost, Large Language Models (and ChatGPT in particular), cope with a broad range of ethical issues – ranging from controversial and untrustworthy output to privacy and copyright issues, to more “external” issues concerning climate and unfair treatment of workers. As LLM’s are well on their way to greatly impact the way in which we arrange society, using “responsible” models that mitigate risks of causing deeper inequality and unfair judgements to already disadvantaged groups are keenly anticipated by governments and educational institutions, amongst many.
This blogpost aims to shed some light on the meaning of open source in the field of Large Language Models, and to discover whether open source models are providing the “responsible AI” the world would like to see.
Definition of Open Source
Software distributed under an Open Source License is software that can be taken by users to use, study, change and distribute this software and its accompanying source code; it has little to no restrictions set upon the use and distribution of the software, as its goal is free access and all-round accessibility.
Open source software stems from the 1983 Free Software Movement, whose philosophy is based on guaranteeing that software users are able to use, study and share software freely, promoting collaborations between programmers and users. This movement was rebranded as the Open Source Initiative in 1998. The OSI is a public benefit corporation, trying to educate on open source and advocate for the benefits of having software that is open to everyone. On their website, an explanation of the definition of open source can be found, citing free distribution and (preferably free of charge) access to the source code as the first two criteria.
The accessibility of open source software provides certain advantages, among which are transparency, better security, flexibility and freedom from lock-in by companies. As the source code is open to anyone, oversight on security is higher, and security fixes and updates can be brought about by anyone, making software less susceptible to long-standing vulnerabilities of leaks due to inaction by companies. Moreover, due to source code being openly accessible, users can actually see for themselves how software “works”, and thus being able to verify if a product claim is true. This transparency might also give a hand in mitigating bias in software. As a result, software is easy to adapt to personal needs and public trust in software can be greatly improved – no luxury feat in a time where distrust in governments and companies due to intransparent use of algorithms and AI is rising. Open source Large Language Models may thus provide a valuable alternative to their closed off commercial counterparts, if they do turn out to be more open and transparent.
Open source Large Language Models (LLM’s)
As mentioned in the introduction, most open source models are fine-tuned on existing models. Fine-tuning LLM’s means training a smaller model on outputs of a larger model, thus improving the smaller model’s performances at a smaller cost than training a large model from scratch. Open source LLM’s like Vicuna (LMSYS), Alpaca (Stanford) and Dolly V2 (Databricks) are fine-tuned on larger models like LlAMa (Meta), a Pythia model (EleutherAI) and data from ShareGPT. Other open source models, such as BLOOM (BigScience), are pre-trained through so called Transformer architecture, similar to the GPT models from OpenAI.
Fine-tuned models appear to approach the capabilities of transformer and proprietary models such as ChatGPT, but still have some shortcomings (including most ethical aspects shared with proprietary models). In a May 2023 paper, one of the biggest shortcomings of fine-tuned models (also called ‘imitation models’) is said to be their poor performance when it comes to broad imitation. Smaller, fine-tuned models appeared to be approaching nearly the same accuracy as big models when it came to training them to perform a specific task, but quickly failed to improve on accuracy when it came to them performing a broad spectrum of tasks. According to the paper, to create a model performing well on a broad spectrum, a massive amount of training data was still what was needed. Underperformance seems to be confirmed on platforms such as Reddit, where developers have been struggling to get open source models work as good in practice as benchmarks claim they are. Another problem related to this is the fact that models trained in other languages can be underperforming, simply because for some languages there is not enough online data to be found to achieve the amount of training data necessary (a problem that Swedish researchers encountered, but managed to mitigate by using data from other North-Germanic languages such as Danish and Norse). However, considering the current rate of progress on LLM development, these problems might be tackled in the (very) near future, as also mentioned in a leaked Google document from earlier this year.
Another current downside of open source models is their User Interface (UI). ChatGPT and Bard have a user-friendly interface and provide API’s, which makes interacting with their models through a chatbot easy for anyone. Most open source models, however, don’t provide such User Interface, and require local installation and fiddling about with frameworks like LangChain to create an application. This, combined with underperformance, may greatly impact the implementation of smaller, open source models by the greater public who do not have the skills or means to figure out the specifics of this themselves.
Lastly, not all open source models are suited for every purpose. Models like Vicuna, Alpaca and BLOOM are all non-commercial models, mainly intended for research purposes. More on this topic can be found under the section Licensing and responsible use.
Openness is not always what it seems…
In July 2023, researchers from the Dutch Radboud University published a paper on the “openness” of LLM’s, concluding that models being presented as open source are not always as open as they claim to be. The researchers gathered information about openness on three different topics: Availability (openness of code, LLM and RLHF training data and weights), Documentation (licensing, code, architecture and preprint, available papers on the model, modelcards and datasheets) and Access (packages and an API for users). In their paper, they state the importance of openness when it comes to transparency, quality control and reproducibility of these LLM’s, all of which contribute to building trustworthy AI systems. In their research, it became clear that even though some smaller models provided good quality documentation, most open source models that were built on other, larger models would still be untransparent about their training data, simply because this data would come from closed off models to start with. Training data on human reinforcement learning also often remains unshared, and papers or peer reviews are absent for all models, with just some blog posts as reference material.
As long as LLM development is still largely dependent on the progress of larger, proprietary models, true open source might thus be difficult to achieve. Following therefrom, it is important to keep in mind that true openness should be interpreted as something broader than accessible source code alone, especially for AI software. Still, in an interview with De Volkskrant, one of the researchers pressed that using open source models with some shortcomings would be the better option when wanting to use LLM’s more responsibly.
Licensing and responsible use: the question of ethics
As mentioned above, a good amount of open source LLM’s are not fit for just any use, as they are created under the licensing conditions of the parent model. This means a lot of open source LLM’s are distributed under a non-commercial license and are exclusively allowed to be used for research and educational purposes, or personal use. In a June 2023 blogpost by the co-founder of GitLAB, AI licensing is explained to consist of 5 categories: proprietary, non-commercial NDA, non-commercial public, ethical, and open source. In the blogpost, it is argued that only licenses who meet the 10 criteria for open source software as defined by the Open Source Initiative (see above), are truly open. In this sense, Meta’s newly released LlAMa model released under a commercial license might seem open, but is still subject to restrictions and does not fulfill all of the OSI’s criteria for open software.
It is also interesting to pause on the distinct category of ethical licensing mentioned in the blog, which is described as follows: “The ethical license category applies to licenses that allow commercial use of the component but includes field of endeavor and/or behavioral use restrictions set by the licensor”. Ethical licensing is thus open in a sense, but closed in another, as the restrictions are based on ethical considerations and call for responsible use of the technology. Ethical licenses are grouped under the term Responsible AI License (RAIL).
Earlier this year, a paper was published on the adoption of Responsible AI Licenses for open source LLM models on the Hugging Face Platform. The researchers found the Open RAIL license to be the second most used type of licensing, even though they were uncertain if this amount is primarily caused by smaller models being finetuned on other models that are accessible under the RAIL license. Thus far, BigScience, the developer behind the BLOOM LLM and Stability AI seem to be the biggest players making their LLM’s available under an open RAIL license. On the RAIL website, several licensing documents can be found. Most of these seem to be focused on using the software in such a way that is does not violate legal rights and laws, and are formulated as such (see the End-User License, for example).
Another organization trying to foster ethical licensing, although not specifically focused on AI, is the Organisation for Ethical Source. They strive to center ethics in open technologies, and provide support in the development of ethical licensing with their “ethical licensure incubator”. Through this initiative, several licenses have been developed, among which are the Hippocratic License 3.0, the Do No Harm license and the At the Root license.
Initiatives like these provide hope that the way to more responsible LLM’s, and more responsible software development in general, will be breaking open. There is one sidenote to address, however.
The focus of the RAIL licenses appear to view “responsible AI” from a mostly legislatory perspective, while the Organisation for Ethical Source appears to focus mostly on Human Rights. Even though ethical issues can also be legislatory or Human Rights issues, to frame them purely within those bounds sells the breadth of ethics short. True ethics requires more than just asking yourself “is this legal?” or “do we comply with international agreements?”, but also requires both developers and users to ask themselves “is this desirable?”. Even if you comply with all (legal) requirements, there could still be reasons not to use LLM’s, for specific purposes, or even as a tool in itself. Using and licensing LLM’s in an ethical way would therefore require ongoing reflection and focus on topics such as (conflicting) values and the interests of different stakeholders. Subsequently, ethical licenses should strive to incorporate ethics in a broad way by and through encouraging this reflection.
All in all, even though for now the question on “responsible” licensing and responsible AI seems to be whether these are as ethical and responsible as they could or should be, the existence of open source LLM’s and initiatives concerning ethical licensing seems a promising and hopeful development. The future is ethical.
Disclaimer: this article was written during August 2023, following our July 2023 blogpost on the ethical aspects of Large Language Models and ChatGPT in particular. Due to the rapid progress in the field of AI, some information may have changed as of publication.