Alongside the new model, Microsoft announced DeepSpeed is now open source. This is a platform that is designed to make it significantly easier to train large models. Looking more closely at Turing NLG, it boasts 17 billion parameters, making it twice as large as Nvidia’s Megatron, which was the previous largest Transformer-based language generation model. Compared to OpenAI’s GPT-2, Microsoft’s creation has 10 times as many parameters. If you’re unfamiliar with language generation models based on Transformer architecture, they are built to predict language. Specifically, which word will come next in language. Driven by AI, platforms like Turing NLG can generate answers from complete sentences, provide a summary of a text, and write stories. “T-NLG has advanced the state of the art in natural language generation, providing new opportunities for Microsoft and our customers. Beyond saving our users time by summarizing documents and emails, T-NLG can enhance experiences with the Microsoft Office suite by offering writing assistance to authors and answering questions that readers may ask about a document.”
DeepSpeed
With DeepSpeed, Microsoft is opening its deep learning library to all developers. It is built for dev’s who want to generate low latency, high throughput inference. This library has Zero Redundancy Optimizer (ZeRO) that trains models with 100 million parameters or more at scale. DeepSpeed was used to train Turing NLG: “Beyond saving our users time by summarizing documents and emails, T-NLG can enhance experiences with the Microsoft Office suite by offering writing assistance to authors and answering questions that readers may ask about a document,” Microsoft AI Research applied scientist Corby Rosset wrote in a blog post yesterday. DeepSpeed and ZeRO are now available to developers.