Appen's OTS datasets are designed to make the job os training AI and ML models easier and faster.
The latest additions are:
• Scripted speech for Arabic (Egypt), Arabic (Saudi Arabia), Arabic (United Arab Emirates), Central Khmer (Cambodia), Croatian, Greek, Hungarian, Polish, Spanish (Spain), and Turkish
• Image OCR for Simplified Chinese printed text, Thai printed text, and Finnish printed text, including pre-recorded billboards, outer packaging, signs, magazines, and menus to train and update computer vision OCR models
• Human body movement (China), including annotated videos of people moving, tracked at pixel level, suitable for game development, fitness apps and more
• Baby crying audio (China), includes pre-recorded and annotated baby sounds that can be used to train AI models to recognise different crying sounds and alert parents
Appen now offers more than 250 datasets, comprising more than 11,000 hours of audio, 25,000 images and 8.7 million words in 80 languages and multiple dialects.
"AI teams around the world working on projects with tight deadlines and flexible data requirements can benefit from using off-the-shelf datasets," said Appen CTO Wilson Pang.
"OTS datasets shorten time to value and provide access to high-quality data at a lower total cost than using traditional methods. We at Appen take the necessary steps to ensure that all our datasets are ethically sourced and demographically balanced, enabling companies to maintain responsible AI practices by minimising bias in their models and ensuring fair treatment of data annotators. You always know the precise quality of an OTS dataset, which helps build better AI that works in the real world."
Appen senior director of AI specialists Judith Bishop said "We interact with AI from the moment we wake up to the moment we go to bed – through virtual assistants, chatbots, search engines, social networks, medical devices, smart cars and other applications.
"Language is often the primary interface for many of these compelling AI use cases, so to guarantee a great experience, the model needs to be trained to work for everyone. Appen's commitment to high-quality data and responsible, ethical AI development allows companies purchasing our off-the-shelf datasets to accelerate their AI projects with complete confidence in their data."
Image: Beth via Flickr (CC BY 2.0)