News

Revolutionary innovation: Dolly 2.0 from Databricks brings cost-effective language models to enterprises and startups

Databricks presents Dolly 2.0 – An Open Source Language Model for Enterprises and Startups

Databricks presents Dolly 2.0 - An open-source language model for enterprises and startups

Data privacy and compliance are critical issues for businesses today, especially for startups and small to mid-sized companies. As a Data-Privacy-as-a-Service start-up, heyData offers an all-in-one platform solution that helps companies efficiently manage their data privacy and compliance requirements. In this context, Dolly 2.0, Databricks' latest language model, is significant.

As a Data-Privacy-as-a-Service startup, we are constantly looking for innovative solutions to help startups, enterprises and founders meet their data privacy and compliance requirements. Today, we are excited to introduce a breakthrough innovation: Dolly 2.0, the world's first open and instruction-led Language Model (LLM), developed by Databricks.

Background

In the world of AI models, Databricks has made a remarkable advance with Dolly 2.0. Dolly 2.0 is a ChatGPT-like language model trained for less than $30. It is based on the EleutherAI Pythia model family and has been fine-tuned by Databricks crowdsourced contributors with a human-generated instruction dataset licensed for research and commercial use. This unique model is now open source, providing enterprises and startups with a cost-effective way to build and customize powerful language models for their conversational interactions.

Open source: Free use for companies and startups

Dolly 2.0 is open-sourced, which means organizations can create, customize, and own their own speech models without having to pay for paid API services or share data with third parties. This is a groundbreaking development for enterprises and startups looking for cost-effective solutions for their voice interactions. With Dolly 2.0, they have the freedom to customize and extend the model to meet their specific needs.

Unique instruction dataset for fine-tuning

A unique feature of Dolly 2.0 is the instruction dataset created by Databricks crowd-sourced contributors. The Databricks Dolly 15k dataset, with 15,000 prompt/response pairs, was developed specifically for matching large language models to instructions and is available under a Creative Commons Attribution-ShareAlike 3.0 Unported license. This means that it can be used, modified, or extended by anyone, including commercial applications. This dataset is the first open-source human-generated instruction dataset that allows large-scale language models to show the interactivity of ChatGPT. It contains natural, expressive training records representing various behaviors such as brainstorming, content creation, information retrieval, and summarization.

Motivation

The creation of the new Dolly 2.0 dataset was motivated by requests from users who wanted to know if they could use Dolly commercially to circumvent restrictions on commercial use of the original Dolly 1.0 model. Databricks responded to these requests by creating a new dataset specifically for commercial applications. This underscores Databricks' commitment to the needs of its users and gives companies access to powerful language models for their commercial applications. The motivation for creating this new dataset was due to requests from users who wanted to know if they could use Dolly commercially to get around the commercial use limitations of the original Dolly 1.0 model. Dolly 1.0 was trained with a dataset from the Stanford Alpaca team using the OpenAI API, but this imposed restrictions on commercial use due to the terms of use. Databricks then decided to create a new dataset that was not "contaminated" and could be used for commercial purposes. To do this, they took inspiration from OpenAI's InstructGPT research and engaged Databricks staff in a competition to generate an original and high-quality dataset that included various tasks such as open and closed Q&A, information extraction and summarization from Wikipedia, classification, and creative writing.

Inspiration from InstructGPT

The development of Dolly 2.0 was inspired by OpenAI's groundbreaking research paper on InstructGPT. InstructGPT is a language model specifically trained to follow instructions and perform complex tasks. Dolly 2.0 is based on EleutherAI's Pythia family of models and has been trained on the Databricks Dolly 15k dataset to develop similar capabilities in interacting with instructions. This enables Dolly 2.0 to handle a variety of tasks such as brainstorming, content creation, information retrieval, and summarization, and provide real support for users in different application domains.

Advantages for startups and enterprises

Releasing Dolly 2.0 as open source offers numerous benefits for startups and enterprises. Here are some of the most important:

  1. Cost efficiency: because Dolly 2.0 is open source, startups and enterprises can use the software for free without having to pay expensive licensing fees or subscriptions. This allows them to use resources for other important aspects of their business model.
  2. Flexibility: Being open source, Dolly 2.0 offers users the ability to customize and adapt the software according to their own needs. Startups and enterprises can customize Dolly 2.0's functions and features to meet their specific needs in order to develop tailored solutions.
  3. Community Engagement: The open source community is known for its collaboration and sharing of knowledge and resources. By releasing Dolly 2.0 as open source, startups and enterprises can benefit from working with the developer community to fix bugs, implement new features and further improve the software.
  4. Faster innovation: open source allows startups and enterprises to build on an existing code base to develop innovative solutions faster. By using Dolly 2.0 as open source, they can benefit from the work of other developers and build their own innovations on a proven platform.
  5. Interoperability: As open source, Dolly 2.0 can be integrated with various technologies and systems, allowing startups and enterprises to interact with other products and services and extend their functionality.
  6. Transparency and trust: Since Dolly 2.0's source code is open source, startups and enterprises can review the code and ensure that it is secure and trustworthy. This can help build customer and user trust in the software.
  7. Resource sharing: by using open source, startups and companies can share resources and exchange ideas with other developers and organizations. This can lead to more efficient use of resources and create synergies to work together on new solutions.
  8. Adaptability: the open source nature of Dolly 2.0 allows startups and enterprises to adapt the software to new technologies, market requirements or business models. This allows them to be agile in their response and continuously improve their solutions to remain competitive.

In summary, Databricks' Dolly 2.0 provides enterprises with a powerful language model development solution with advanced features such as transfer learning, cultural customization and monitoring capabilities. It enables organizations to create high-quality, adaptable language models and integrate them into their existing workflows and data processing pipelines. With Dolly 2.0, companies can harness language AI technology to improve their use cases, optimize communication with their target audience, and streamline their business processes.


About the Author

More articles

understanding-the-eu's-digital-services-act

Understanding the EU's Digital Services Act: A Guide for Businesses

The EU Digital Services Act (DSA) creates a safer, more transparent digital space and protects user rights. Since December 2020, it has significantly revised the E-Commerce Directive of 2000. The DSA promotes transparency, accountability, tackling illegal content and competition in the digital market. It applies to various digital services in the EU, including network infrastructure providers, hosting services, online platforms, and very large online platforms (VLOPs). Key obligations include transparent reporting, removal of illegal content, user complaint mechanisms, risk assessments, and transparent advertising. Non-compliance can lead to significant fines, sanctions, and reputational damage. Companies should understand their obligations and comply to promote a trustworthy digital ecosystem.

Learn more
Is-Your-DNA-Safe-EN

Is Your DNA Safe? Genetic Testing Risks and How to Protect Your Data

Delve into the aftermath of the genetic testing data breach, exemplified by the recent incident involving 23andMe, and understand the pressing need to protect genetic information. Uncover the risks posed by such breaches and gain insights into effective solutions to safeguard DNA privacy in an era where technological advancements outpace regulatory frameworks. Explore best practices, regulatory considerations, and expert solutions like heyData, designed to fortify your data privacy defenses and empower you to navigate the intricate landscape of genetic testing with confidence

Learn more
Data-Protection-HR.jpg

Data protection in human resources: The legal basics

Guest article by heyData - first published on HR Works on the topic of data protection in human resource

Learn more

Get to know our team today, with no obligations!

Contact us