Introduction

Welcome to the Linaro performance challenge. To participate in this challenge, please register. Registration allows you to select software packages, keep us updated on your progress and participate in discussions on this website

Who is ARM and Linaro and how to get started?

ARM is a corporation that designs architecture which is then licensed out to put into silicon, whether that silicon be CPU chips, GPU chips, or a System On a Chip (SoC).

Linaro is an association that is interested in having GNU/Linux work well on ARM architecture.

To find out more about Linaro and its workings you can go to http://www.linaro.org, but to get started faster in working with Linaro’s community, you can go to http://www.linaro.org/engineering/developersHere you will find probably one of the best “getting started with a free software project” papers that I (maddog) have ever read.  It will not tell you the bits and bytes of dealing with code, but (almost) more importantly, the ways of dealing with and contributing to projects.

Rules, time frames and judges

The rules for this competition are available on this site here. You can make entries at any time and the judges and winners will all be announced here.

Universities

Linaro encourages universities to use this opportunity to both teach their students topics of code performance improvement and ARM assembly language. Students are welcome to submit their classwork into the contest as long as the required submission procedures are followed. It is noted that good work done in this contest should also look good in a student’s portfolio for a prospective employer, and we encourage that effort.

Linaro is also interested in cooperating with universities that would like to formulate a one or two semester course in software performance optimization.  Universities interested in this topic should contact us through the “Contact Us” page.

Special Tasks and Prizes

In addition to porting code to ARM-64, optimizations of code and reductions of assembly language in modules as goals of this project there are “special case” prizes to be considered.

Some of Linaro’s partner companies have special boards that may incorporate FPGAs, digital signal processing chips and even massively parallel core CPUs. Optimizing suitable GNU/Linux modules to better utilize these special chips would receive special prizes.

Candidates that wish to submit this type of work should write a BRIEF proposal on the “Contact Us” page which will then be submitted to the appropriate member company for interest.

Candidates that would like to work on a compiler intrinsic should enter a topic in the “Intrinsics” portion of this site to submit a proposal and discuss it will other compiler writers.

Artists who wish to contribute graphics to the contest under Creative Commons No Derivatives license may do so, and will receive recognition for any work accepted and posted to the site.

Submission Criteria

In order to submit an entry, the candidate should measure the performance of the module previous to doing the porting and optimization on at least Intel-32 and Intel-64 and ARM-32 and ARM-64 processors or emulators. In the case of ARM-64 a “pre-optimization” measurement is not required. The performance measured should consist of memory utilization, wall clock time, percentage of CPU used, and other criteria that will be determined over time and listed. The candidate should also list the compiler options used before optimization is attempted.

After the optimization is performed, the candidate should perform the same measurements and compute the percentage improvements in speed, efficiency and memory utilization. For non-embedded use (i.e. modules that tend to run on desktop or server systems) reasonable tradeoffs in memory versus speed and efficiency will be encouraged. Performance on single-core and multi-core processors should also be measured, to show improvements due to increases in parallelism.

The candidate should then write up a short description of the work they have done, including any code or complexity reduction (for example, removing assembly language completely and allowing the compiler to generate the code) and should then submit their source code to the maintainer for approval, along with any data test-sets that the candidate has used Once the maintainer approves the code submission, the candidate should submit the code, the write-up, the data test-sets and the results of the tests to the contest site.

All written work should be submitted under a Creative Commons “Share-alike” license, with the code being submitted under the same license that the code had before.

Steps for Entering the Contest

In order to enter the contest the potential candidate should register on the contest site at http://performance.linaro.org

The page will request your full, legal name, email address, postal address and size of your shirt using men’s USA sizes: Small (S), Medium (M), Large (L), Extra-Large (XL), Extra-Extra Large (XXL) and Extra-Extra-Extra Large (XXXL). For larger sizes please contact the contest administrator. This information will be kept private and only used in conjunction with this contest.

The site will also require verification of your email address, so please click on the verification link sent to your supplied email address.

Once registered, the candidate should look for a particular module that they would like to optimize from the list of all modules available at this site. Each candidate will be able to work on one module at a time, but when they finish and submit the module and its associated documentation the candidate will be able to select another module. If the candidate decides to stop working on the module, they should “de-list” the module from their selection.

Experience has shown that some of the modules listed have higher-level fall-back code, and will build and run on an ARM-64 processor with no “porting” work having to be done.  This is fine, and we encourage contestants to compile the code, test it on either the ARM-64 model or on ARM-64 hardware and then mark it as “ported” in the on-line database.

Other modules may contain a lot of code dependent on other modules that have not been ported yet.  In this case the module should be “de-listed” until those modules are ported, and work be directed to porting those modules first.

Finally, there are some modules where the contestant might ask the relevancy of the code on a 64-bit ARM system.  We have found, other than in a very few instances, that even the most “unlikely” code may end up on a 64-bit processor, and therefore should be ported along with the “rest” of the code.  Linaro and ARM will do our best to set priorities for the code to be ported.

It is suggested that the candidate then contact the maintainer and discuss the module with them, in order to find out the criteria (coding style, algorithm usage, computer language issues) that the maintainer wishes for the module.   Following the techniques and practices of the code owner/maintainer will go a long way to having your code accepted by the maintainers when finished.

Once the module has been ported to ARM-64 and optimized, the candidate should submit all the source code and supporting documentation to the contest following the check-list for module submission (below) and mark that the module has been optimized at the same URL.

At that point the candidate can select another module for optimization.

Every candidate who submits even one completed, acceptable module for the contest will receive an “entry prize” of a very nice Linaro golf shirt plus be entered into a drawing for an all-expense paid trip to a Linaro connect symposium. The more submissions that are entered and qualify for the contest, the greater the chance of winning a trip to Connect to present the results of their work to the Linaro Connect attendees.

Information and Tools on Performance

Understanding that many people do not know much about assembly language programming, the compiler switches of various compiler programs, nor do they have ready access to the tool chains (IDEs, compilers, debuggers, code analyzers) and do not have access to some of the architectures that are supported by GNU/Linux, Linaro has started a site that has what it considers “best in class” information on these topics. Linaro hopes that over time this information will be condensed into a course in optimization taught by universities, with the base knowledge covered under a Creative Commons license.

Linaro is in an active search for one or more universities that would like to cooperate in this task, and would like to receive emails indicating serious interest at this email address: [email address needed here]

In the meantime there will be wikis at this address http://performance.linaro.org currently maintained by Linaro, which accumulates knowledge about optimization tools and techniques, including descriptions of ARM architectures and emulators of various architectures.

This site will also develop examples of one or two modules that have been ported and optimized by Linaro engineers showing the steps and techniques for measuring performance before doing the work, the steps and techniques for measuring performance after the work is done, and calculating the performance improvement as well as a sample submission of documentation on what was done.

Currently few actual ARM-64 development boards exist, therefore it is suggested that development use ARM-64 emulators, which (along with instructions in how to use them) can be found here

http://www.linaro.org/engineering/engineering-projects/armv8

Over time there will be actual 64-bit systems available over the network for testing your porting and optimization work on an actual ARM 64-bit processor.

The site will also be generating PDFs of performance contest giveaways, generated under Creative Commons, No Derivatives. These PDFs may be downloaded and used to make T-shirts, coffee mugs, beer mugs, stickers for laptops and tablets and wall posters.

Purpose of this site

In anticipation of ARM’s new 64-bit architecture Linaro reviewed some of the source code of a typical GNU/Linux system and found over 1400 source code modules which had ARM assembly language, and which would need to be ported and tested to work on ARM’s new 64-bit processors.

Linaro also recognized that some of the modules were written a long time ago (by computer standards) when CPUs were single core and not multi-core, compilers were not as optimized and RAM memories were smaller and more expensive leading to tradeoffs in portability and algorithm selection. In today’s era, it might be better to re-evaluate the use of assembly language and replace it completely with a higher-level language such as “C”. It might also be worthwhile to review algorithms that made sense in an earlier time, but have outlived their usefulness.

In some cases the assembly language that exists in the code was “transposed” from existing assembly language of a different architecture, and did not necessarily utilize the best features of each assembly or machine language architecture. In other cases it might make more sense to create a compiler intrinsic to do certain functions such as identifying the architecture of the machine.

Finally, while the code in the modules may be very efficient and highly portable, the compiler invocations may need review to take advantage of new optimization switches.

Linaro and its associated member companies also see an alarming trend in education of new computer engineering students away from knowledge of the computer architecture and more toward higher-level languages in programming. While Linaro’s members do not necessarily encourage programming in assembly language, they do recognize the value of understanding exactly how the machine and compilers work and organizing a program’s code and data for maximum performance. Therefore Linaro is promoting coursework and white-papers in performance topics for students of computer science and engineering.

In pursuit of this performance goal, Linaro decided to create a long-running performance contest directed at these 1400 modules which focuses on the following performance criteria:

Utilization of Memory

Some code is used in embedded systems, and therefore is sensitive to the amount of memory used. Any code that is “memory sensitive” should not take up any significant amount of memory for a performance improvement.

Wall-clock Speed Performance Improvement

This is normally what people think of as a performance improvement….the length of time it takes a program to execute. This can be in several categories, such as embedded, desktop, server and High Performance Computing.

Efficiency of Execution

In today’s world of battery operated handsets, the efficiency of how many machine cycles are executed independent of the wall clock time is also important. If the program can complete its task faster, than the system can go to its idle loop faster and lower the power usage of the CPU, independent of the program taking less wall-clock time to perform its function.

Likewise modern server systems have the ability to turn off memory, disks and even cores if they are not running an application, so if the application finishes faster, the system goes into power saving mode sooner, and can save a considerable amount of electrical power (and cooling) over less efficient code.

Architecture Neutrality

Linaro recognizes there are other architectures in the marketplace other than ARM. Changes to a program should not make these other architectures run slower or with less efficiency.

Code Acceptance by Code Author or Maintainer

Linaro recognizes that various people wrote and/or maintain the code that is in this list of 1400 modules. Linaro requires that contestants work with the authors and maintainers, make them aware of the changes, and make the contestant’s code acceptable to the maintainer(s), matching the coding style of the maintainer(s).  Linaro suggests that this work could be done through the normal code development tools and cycles of the project, and Linaro encourages the contestant to learn these methods.

Algorithm Replacement

In case of the aforementioned changes to modern-day computers (larger RAM, multi-core, etc.), as well as discovery of new algorithms over time, it is possible that a larger re-write of code to get higher performance through algorithm replacement would be appropriate. This is particularly true when porting to the ARM-64 architecture. In this case the up-stream maintainer would definitely need to be consulted, and it is suggested that the contestant get their approval ahead of making the changes.

These (and other criteria over time) will be used as the central criteria for judging the contest.