Google, Cambridge, DeepMind & Alan Turing Institute’s ‘Performer’ Transformer Slashes Compute …

image-1.png

Google, Cambridge, DeepMind & Alan Turing Institute’s ‘Performer’ Transformer Slashes Compute …

It’s no coincidence that Transformer neural network architecture is gaining popularity across so many machine learning research fields. Best known for natural language processing (NLP) tasks, Transformers not only enabled OpenAI’s 175 billion parameter language model GPT-3 to deliver SOTA performance, the power- and potential-packed architecture also helped DeepMind’s AlphaStar bot defeat professional StarCraft players. Researchers have now introduced a way to make Transformers more compute-efficient, scalable and accessible.

While previous learning approaches such as RNNs suffered from vanishing gradient problems, Transformers’ game-changing self-attention mechanism eliminated such issues. As explained in the paper introducing Transformers — Attention Is All You Need, the novel architecture is based on a trainable attention mechanism that identifies complex dependencies between input sequence elements.

Transformers however scale quadratically when the number of tokens in an input sequence increases, making their use prohibitivelyexpensive for large numbers of tokens. Even when fed with moderate token inputs, Transformers’ gluttonous appetite for computational resources can be difficult for many researchers to satisfy.

A team from Google, University of Cambridge, DeepMind, and Alan Turing Institute have proposed a new type of Transformer dubbed Performer, based on a Fast Attention Via positive Orthogonal Random features (FAVOR+) backbone mechanism. The team designed Performer to be “capable of provably accurate and practical estimation of regular (softmax) full rank attention, but of only linear space and timely complexity and not relying on any priors such as sparsity or low-rankness.”

image.png

Softmax has been a bottleneck burdening attention-based Transformers computation. Transformers typically use a learned linear transformation and softmax function to convert decoder output to predicted next-token probabilities. The proposed method instead estimates softmax and Gaussian kernels with positive orthogonal random features for a robust and unbiased estimation of regular softmax attention in the FAVOR+ mechanism. The research confirms that using positive features can efficiently train softmax-based linear Transformers.

image.png
image.png
image.png

Leveraging detailed mathematical theorems, the paper demonstrates that rather than relying solely on computational resources to boost performance, it is also possible to develop improved and efficient Transformer architectures that have significantly lower energy consumption. Also, because Performers use the same training hyperparameters as Transformers, the FAVOR+ mechanism can function as a simple drop-in without much tuning.

The team tested Performers on a rich set of tasks ranging from pixel-prediction to protein sequence modelling. In their experimental setup, a Performer only replaced a regular Transformer’s attention component with the FAVOR+ mechanism. On the challenging task of training a 36-layer model using protein sequences, the Performer-based model (Performer-RELU) achieved better performance than the baseline Transformer models Reformer and Linformer, which showed significant drops in accuracy. On the standard ImageNet64 benchmark, a Performer with six layers matched the accuracy of a Reformer with 12 layers. After optimizations, Performer was also twice as fast as Reformer.

Because Performer-enabled scalable Transformer architectures can handle much longer sequences without constraints on attention mechanism structure while remaining accurate and robust, it is believed they could lead to breakthroughs in bioinformatics, where technologies such as such as language modelling for proteins have already shown strong potential.

The paper Rethinking Attention With Performers is on arXiv.


Reporter: Fangyu Cai | Editor: Michael Sarazen


B4.png

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon KindleAlong with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.


AI Weekly.png

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Published at Fri, 02 Oct 2020 20:03:45 +0000

Amesite CEO Dr. Ann Marie Sastry Scheduled to Appear on Fox Business Network’s Mornings With …

ANN ARBOR, Mich., Oct. 2, 2020 /PRNewswire/ — Amesite Inc. (Nasdaq: AMST), an artificial intelligence software company providing online learning ecosystems for business, higher education, and K-12, announced today its CEO, Dr. Ann Marie Sastry, is scheduled to appear on Fox Business Network’s Mornings With Maria Monday morning at 6:30 a.m. ET.  

Dr. Sastry will discuss the latest in remote learning and how it is impacting students, parents, teachers, and administrators. She will also explain how artificial intelligence is helping to power Amesite’s next generation online learning platform and why innovation in the EdTech space has become so crucial.

About Amesite Inc.

Amesite is a high tech artificial intelligence software company offering a cloud-based platform and content creation services for K-12, college, university and business education and upskilling. Amesite-offered courses and programs are branded to our customers.  Amesite uses artificial intelligence technologies to provide customized environments for learners, easy-to-manage interfaces for instructors, and greater accessibility for learners in the US education market and beyond.  The Company leverages existing institutional infrastructures, adding mass customization and cutting-edge technology to provide cost-effective, scalable and engaging experiences for learners anywhere.  For more information, visit https://amesite.com.

Forward Looking Statements

This communication contains forward-looking statements (including within the meaning of Section 21E of the Securities Exchange Act of 1934, as amended, and Section 27A of the Securities Act of 1933, as amended) concerning the Company, the Company’s planned online machine learning platform, the Company’s business plans, any future commercialization of the Company’s online learning solutions, potential customers, business objectives and other matters. Forward-looking statements generally include statements that are predictive in nature and depend upon or refer to future events or conditions, and include words such as “may,” “will,” “should,” “would,” “expect,” “plan,” “believe,” “intend,” “look forward,” and other similar expressions among others. Statements that are not historical facts are forward-looking statements. Forward-looking statements are based on current beliefs and assumptions that are subject to risks and uncertainties and are not guarantees of future performance. Actual results could differ materially from those contained in any forward-looking statement. Risks facing the Company and its planned platform are set forth in the Company’s filings with the SEC. Except as required by applicable law, the Company undertakes no obligation to revise or update any forward-looking statement, or to make any other forward-looking statements, whether as a result of new information, future events or otherwise.

Media Contact – Robert Busweiler[email protected] – 631.379.6454

SOURCE Amesite Inc.

Related Links

https://amesite.com

Published at Fri, 02 Oct 2020 18:33:45 +0000