Building, collecting, and serving assets

MDN requires static assets (images, JavaScript, CSS, and other files) to augment the HTML generated by Django. Kuma uses a combination of technologies and techniques to build and deliver static assets. Many of the build and development processes are documented on the Installation and Development documents. This document goes into the details.

The three phases of an asset’s life are Building, Collecting, and Serving.

  • Building - Source files are processed to build intermediate and final assets
    • Locale files contain translatable strings extracted from source files, are translated by humans, and are compiled to a binary representation for runtime use.
    • Localization JavaScript files contain translated strings in an executable format to allow in-browser translation of UI strings.
    • CKEditor is packaged with custom plugins for MDN’s use cases.
    • Many JavaScript libraries that are used on MDN are included in Kuma’s repository for version control.
    • JavaScript bundles, assembled by django-pipeline, combine several source files into a single minified JavaScript file.
    • CSS bundles, assembled by django-pipeline or gulp, combine several source Sass files into a single minified CSS file.
  • Collecting - Built assets are collected to the /static folder.
    • Django’s staticfiles provides a framework of Finder and Storage classes for collecting files.
    • django-pipeline augments staticfiles, to allow compiling, bundling, and minifiying JS and CSS assets in the collection phase.
    • gulp provides alternative tools to staticfiles for local development of assets.
  • Serving - Collected assets are served to visitors in development and production
    • In Django and Jinja templates, the static tag returns the URL of assets collected by staticfiles.
    • The statici18n tag returns the URL of localization JavaScript collected by staticfiles.
    • The pipeline tags javascript and stylesheet return the HTML markup for assets processed and collected by django-pipeline.
    • WhiteNoise serves static assets as part of the Django process.

Extracting and building locale files

Kuma uses Pontoon to translate strings in the user interface, in error messages, and in emails. These are stored in the mdn-l10n repository, and included as a git submodule at locale/. See the localization document for more details about locales.

Puente extracts strings to the Portable Object Template (.pot) files, as specified in the PUENTE configuration. The file locale/templates/LC_MESSAGES/django.pot contains strings from template files and Python code. The file javascript.pot contains strings from JavaScript files. Puente looks for the string parameters of gettext functions, such as gettext(), the common alias _(), and ngettext(). It also parses longer strings in the template tag trans.

Next the changes are merged into the existing Portable Object (.po) files, such as locale/fr/LC_MESSAGES/django.po, to add new strings and comment out removed strings.

Extracting and merging is done with make localeextract, usually during deployment, when UI strings change. This uses the extract management command provided by Puente, which uses Babel to extract strings and update the catalog. A maintainer pushes the updated catalogs as a new commit to the mdn-l10n repository.

Pontoon detects that the repository has changed, and notifies localization teams that there are new strings. In about 48 hours, the most active teams will translate strings into the top 10 MDN languages. These are applied by updating the locale submodule during the deployment process.

At run time, Machine Object (.mo) files, such as locale/fr/LC_MESSAGES/django.mo, are used by gettext functions, like gettext() and _(), to display the localized strings. These are built with make localecompile when creating the production images or when a developer wants to see updated translations.

Building localization JavaScript

Django includes a JavaScriptCatalog view that provides JavaScript implementations of gettext functions, as well as translations for each locale. It is ineffecient to use this view directly, since it is generated on access. For efficiency, django-statici18n generates files for each locale from the JavaScriptCatalog output, so they can be served as static assets.

The translation catalog files are created with make compilejsi18n from the locale Machine Object .mo files. This calls the compilejsi18n management command provided by django-statici18n. Kuma sets STATICI18N_ROOT to build/locale, and the output files have names like build/locale/jsi18n/de/javascript.js.

Building CKEditor

CKEditor is a complex JavaScript application that provides a WYSIWYG editor for MDN wiki pages. It is packaged with plugins, some from third parties, and some custom to MDN.

The CKEditor build process is documented on the CKEditor document. The built files are checked into the Kuma repository.

Including JS libraries

Third-party JavaScript libraries are included in the Kuma repository, to avoid ambiguity about what versions of libraries are used. Some libraries were added manually, and others with Bower. See Front-end asset dependencies for more details about these libraries.

Some of these libraries are served directly to visitors, while others are included in pipleline JavaScript bundles.

Building pipeline JavaScript bundles

Pipeline JavaScript bundles combine several JavaScript files into a single file, with optional minimization. For example, the file static/build/js/main.js is the combination of 10 JavaScript files:

The JS bundles are specified in PIPELINE_JS in the Django settings. The bundles are served differently in “development” and “production” modes. This is roughly controlled by the Django setting DEBUG, which sets further parameters like PIPELINE[PIPEINE_ENABLED], and the environment setting DJANGO_SETTINGS_MODULE, which switches the Django settings file. See django-pipeline as well as the pipeline tags section for details.

In development, the source files (10 for main.js) are served, so there are 10 <script> elements in the HTML when {{javascript('main')}} is used in a template. In production, the output bundle is used, so a single <script> tag appears in the HTML. The single bundle is also processed with UglifyJS, which removes whitespace, replaces variable names with shorter names, and performs other transformations to make the file smaller.

Building pipeline CSS bundles

Pipeline CSS bundles are conceptually similar to Pipeline JS Bundles. Some contain multiple source files, such as static/build/styles/dashboards.css, which combines:

Source styles are written in Sass, and compiled to CSS with node-sass. These must be compiled to CSS in both development and production modes. Backend developers tend to use make build-static to build and collect these files, and front-end developers tend to use gulp watch to directly compile them. See Front-end development for more information.

The CSS bundles are specified in PIPELINE_CSS in the Django settings. The bundles are served differently in “development” and “production” modes. This is roughly controlled by the Django setting DEBUG, which sets further parameters like PIPELINE[PIPEINE_ENABLED], and the environment setting DJANGO_SETTINGS_MODULE, which switches the Django settings file. See django-pipeline as well as the pipeline tags section for details.

In development, the source files (2 for dashboards.css) are used, so there are 2 <link> elements in the HTML when when {{stylesheet('dashboards')}} is used in a template. In production, the output bundle is used, so a single <link> tag appears in the HTML. When bundled, CSS is also processed by clean-css, which transforms the CSS to make the output files smaller.

Collecting asset files with staticfiles

Django provides the django.contrib.staticfiles app, widely used in Django projects to standardize where assets are stored, to collect them for development and production, and to use different asset URLs in different environments.

In development mode, the staticfiles app helps identify assets spread across the project, and often allows a rapid development cycle (for example, change a file, refresh the browser, and see the effects of the changed file). For production, the staticfiles app provides the management command collectstatic, which gathers files to the /static folder for efficent file serving.

The Django documents for staticfiles are mostly focused on usage. Additional details are needed to understand how django-pipeline customizes staticfiles.

Configuration

The staticfiles app is configured by Django settings:

STATIC_ROOT
The folder on the file system where assets are collected. For MDN, this is the static folder in the kuma directory.
STATIC_URL
The base URL for static assets. In development, this is http://localhost:8000/static/, and in production it is https://developer.mozilla.org/static/.
STATICFILES_FINDERS

The dotted path to classes implementing staticfiles Finder. These determine what files will be collected and served. Kuma uses four finders:

STATICFILES_DIRS

A list of folders in the kuma directory that the FileSystemFinder will scan for static assets. For MDN, this includes:

  • assets/static
  • assets/ckeditor4/build (to /static/js/libs/ckeditor4/build)
  • kuma/static
  • kuma/javascript/dist
  • build/locale
  • jinja2/includes/icons

For example, the localization JavaScript build/locale/jsi18n/fr/javascript.js will be collected to static/jsi18n/fr/javascript.js.

STATICFILES_STORAGE

The dotted path to a class implementing staticfiles Storage. Storage determines where files are stored, what URLs they have, and provides hooks for modifying files when copying them. Kuma uses three different storages, depending on the context:

Finder classes

The staticfiles app uses Finders to locate asset files. Django considers this a private API, so it may change in the future. There are two methods the BaseFinder class expects to be implemented:

  • find(path): Given a short path like css/wiki.css, return the absolute path to the file. This is used by the findstatic management command, and to find files when serving assets in development mode.
  • list(ignore_patterns): Return a list of the files this Finder can find, along with a storage instance for each. The collectstatic management command uses this to gather files.

The staticfiles app provides two finders used by Kuma:

  • The FileSystemFinder collects files under the folders specified in the STATICFILES_DIRS setting.
  • The AppDirectoriesFinder collects files in the (optional) static subfolder of any installed app listed in INSTALLED_APPS. This is how Django applications, including ones bundled with Django, distribute JavaScript, CSS, images, and other assets. It isn’t used for Kuma’s apps. Instead, we’ve standardized on kuma/static and other named paths.

The Finders are used by WhiteNoise to determine which file to serve in development mode. The management command findstatic can be used to determine which file is served, such as:

$ ./manage.py findstatic -v2 js/main.js

Found 'js/main.js' here:
  /app/kuma/static/js/main.js
  /app/static/js/main.js
Looking in the following locations:
  /app/kuma/static
  /app/build/locale
  /app/jinja2/includes/icons
  /usr/local/lib/python2.7/site-packages/flat/static
  /usr/local/lib/python2.7/site-packages/django/contrib/admin/static
  /usr/local/lib/python2.7/site-packages/constance/static
  /usr/local/lib/python2.7/site-packages/djcelery/static
  /usr/local/lib/python2.7/site-packages/django_extensions/static
  /usr/local/lib/python2.7/site-packages/rest_framework/static
  /usr/local/lib/python2.7/site-packages/debug_toolbar/static
  /app/static

When multiple files are found, the first is used. In the above example, /app/kuma/static/js/main.js will be served in development for /static/js/main.js.

Storage classes

The staticfiles app uses a Storage class, which extends Django’s Storage class for asset workflows. Django documents how to write a custom storage system, and there are many 3rd-party storage packages for using various cloud providers for file hosting. The configured STATICFILES_STORAGE class is used when collecting files with ./manage.py collectstatic.

Django’s standard Storage classes provide methods like delete(), exists(), and size() for implementing file methods, and methods like listdir() for getting lists of files. There is a wide variety of storage backends with different capabilities, and Django allows most methods to raise NotImplementedErrror if an operation is not supported or is too expensive.

A staticfiles Storage class extends the standard Storage classes and requires a few more methods, although the exact methods are undocumented. Some are path(name), to turn a relative path to a full path, and url(path), to get the external URL of the file. An optional method, post_process(), can be defined to further process the files, and returns a map of the old paths to the new paths.

The default storage, StaticFilesStorage, is based on the standard FileSystemStorage, and copies static files to STATIC_ROOT (the static folder). For the url() method, it prepends the STATIC_URL to the path.

ManifestStaticFilesStorage implements the post_process() method to add the MD5 hash of the file’s contents to the filename. This allows these files to be served with very long cache times, since changes will also change the filename. It also requires manipulating the contents so that references to assets within other files, such as a CSS @import statement, are updated to the hashed names. This often requires source files use relative paths like ../img/logo.svg, so that the tool can find the destination file.

Because of the intense file processing, ManifestStaticFilesStorage doesn’t support the live updates of development mode. It requires DEBUG=False, and that ./manage.py collectstatic is run before running the server, or before a server restart. A map of original to hashed names is stored in staticfiles.json, and is read at server startup to determine the hashed names.

CachedStaticFilesStorage is similar to ManifestStaticFilesStorage, but stores the filename mapping in the cache. It is slower than staticfiles.json, and is used when write access to the filesystem is forbidden.

django-pipeline

The django-pipeline library is used for packing assets. It provides CSS and JavaScript concatenation and compression, built-in JavaScript template support, and optional data-URI image and font embedding. It does this by extending and overriding the django-staticfiles app, so that assets are processed with the standard ./manage.py collectstatic command.

Kuma uses django-pipeline to:

  • Compile Sass .sccs files plain CSS with node-sass
  • Combine multiple JS and CSS files into a single file (“bundle”) in production
  • Compress CSS files with cleancss
  • Compress JS files with UglifyJS

Configuration

The django-pipeline app is configured with the dictionary PIPELINE. There are many configuration items, some of which are:

  • PIPELINE_ENABLED: True to concatenate and compress assets (testing and production), and False to skip concatenation and compression.
  • PIPELINE_COLLECTOR_ENABLED: True to collect assets (testing and production), and False to skip collection and leave them in the source locations.
  • COMPILERS: A list of CSS compilers. pipeline‘s SASSCompiler in testing and production, and kuma.core.pipeline.sass.DebugSassCompiler (which does nothing, but instead defers to gulp) in development.

The Makefile specifies the testing configuration, so commands like make collectstatic run with PIPELINE_ENABLED and PIPELINE_COLLECTOR_ENABLED. However, they are disabled when running the development server.

django-pipeline specifies outputs as a “package”, which specifies one or more inputs, one output, and some optional settings and overrides. PIPELINE['JAVASCRIPT'] specifies the JavaScript packages, and PIPELINE['STYLESHEETS'] specifies the Sass/CSS packages.

Finders

Kuma uses two Finders from django-pipeline.

CachedFileFinder strips hashes from filenames to identify the “pre-cached” names for files, by removing the middle element of filenames with three dots. This may have been useful in django-pipeline 1.3 or earlier, but it appears to do nothing now, or could potentially do the wrong thing such as resolving bootstrap.min.js as bootstrap.js.

PipelineFinder does nothing if PIPELINE['PIPELINE_ENABLED'] if True (testing and production), and uses the Storage to find files if it is disabled. For Kuma, this means it may find files in the STATIC_ROOT directory. However, since the FileSystemFinder finds most files in kuma/static first, it is doubtful if this Finder ever applies.

Storage

Most of the functionality of django-pipeline is implemented as a Storage class, and Kuma uses three different implementations depending on the environment.

The simplest storage, used during testing and in the Makefile, is pipeline.storage.PipelineStorage, which extends the staticfiles Storage class StaticFilesStorage, with a post_process step that packages JS and CSS into one-file bundles, according to the PIPELINE configuration.

Development uses pipeline.storage.NonPackagingPipelineStorage. This works the same way as PipelineStorage, but avoids creating packages, where several files are combined into one. JavaScript files are served from the source folders, but CSS files need to be compiled from Sass, and are served from the /static folder after collection. When developing style files, a developer either needs to run ./manage.py collectstatic or use gulp to see changes.

In production, kuma.core.pipeline.storage.ManifestPipelineStorage is used. This combines the package processing of PipelineStorage with the hashed assets and staticfiles.json of ManifestStaticFilesStorage. These are generated when the production Docker containers are created.

Compiling and collecting assets with Gulp

An alternate way to compile and collect assets is to use Gulp, as described in Compiling on the host system with gulp. This requires installing node and related packages on the “host” system, rather than relying on the Docker containers, but it matches the preferred workflow of some front-end developers.

The gulp process also compiles Sass sources to CSS, and copies files from /kuma/static to /static, mirroring the process from make collectstatic. However, additional tools, like PostCSS, can’t be added to the gulp workflow like other projects, because the make collectstatic process is the only one used to generate production assets.

Template tag static

Django provides a template tag static that outputs the URL of the static asset for HTML. Without staticfiles installed, it just adds STATIC_URL to the start of the path. With staticfiles, it calls the url(path) method of the Storage class. In production, with ManifestStaticFilesStorage, it uses staticfiles.json to return a URLs with hashes in the name.

For example, here is the HTML that includes the Tumbeast in the 404 page:

<div id="beastainer">
  <img id="beast404le" src="{{ static('img/beast-404_LE.png') }}" alt="">
  <img id="beast404re" src="{{ static('img/beast-404_RE.png') }}" alt="">
  <img class="beast 404" src="{{ static('img/beast-404.png') }}" alt="">
</div>

Template tag statici18n

The tag statici18n is provided by django-statici18n. It works like the static tag, outputing the URL of the localization JavaScript. This is included in <body> of all page via the base template, near the bottom:

<script src="{{ statici18n(request.LANGUAGE_CODE) }}"></script>

Template tags javascript and stylesheet

django-pipeline provides two template tags, {% javascript('bundle') %} and {% stylesheet('bundle') %}, that can inject the <script> and <link> elements into a template.

Bundling is controlled by the setting PIPELINE['PIPELINE_ENABLED'] (False for development, True for production). When bundled, the assets are assumed to be processed and collected, so a single element representing the final asset URL is inserted. When bundling is off, the assets are assumed to still be in the source form, and multiple HTML elements are inserted into the document. These tags look more like Jinja2 calls then HTML, like these tags from the revision dashboard:

{% block js %}
{% javascript 'jquery-ui' %}
{% javascript 'dashboard' %}
{% endblock %}

django-pipeline supports other output formats. For example, the editor-content bundle is processed with the javascript-array template, which converts the URLs to a format that can be injected into a JavaScript array, such as the configuration script:

win.mdn.assets = {
     css: {
         'editor-content': [
             {%- stylesheet 'editor-content' %}
             {%- stylesheet 'editor-locale-%s' % LANG %}
         ],
         'wiki-compat-tables': [{% stylesheet 'wiki-compat-tables' %}]
     },
     js: {
         'syntax-prism': [{% javascript 'syntax-prism' %}],
         'wiki-compat-tables': [{% javascript 'wiki-compat-tables' %}]
     }
 };

Serving assets with WhiteNoise

WhiteNoise is a static file serving application, and is an alternative to serving static assets with nginx, Apache, or from Amazon S3. On Kuma, it is used to serve static assets in development as well as production. It made it easy to serve HTML and related assets on the same HTTP/2 connection.

Kuma uses WhiteNoise as a middleware, included as kuma.core.middleware.RestrictedWhiteNoiseMiddleware. This is a wrapper around whitenoise.middleware.WhiteNoiseMiddleware which skips static file serving if Kuma is acting as the attachments / samples host.

In development (DEBUG = True) and testing, WhiteNoise is in “autorefresh” mode, and uses the staticfiles-finder. Each web request to /static scans for the file to use, which can be slow, but will catch any changes made to the files.

In production (DEBUG = False), the files in STATIC_ROOT (/static) are indexed when the web server starts up. It also determine headers, such as caching headers and the CORS header, that will be sent with the file. This makes it very fast to serve static files, but changes after the web server starts will not be noticed.

WhiteNoise provides its own Storage classes, that can compress and cache static asset files. These are currently unused by Kuma, which uses classes based on those provided by django-pipeline.

Future

  • Ensure files that are not meant for visitors are not collected, to speed up development, collecting, and preparing production images.
  • Remove the CachedFileFinder and PipelineFinder.
  • Remove django-pipeline, using gulp on the server as well before running ./manage.py collectstatic.
  • Add django-webpack-loader or similar to integrate React assets

History

The staticfiles application was probably part of the Kuma project from the beginning in 2011. In the SCL3 datacenter, one of the first steps of a production push was collecting the static files to a directory on a network drive. This was shared between web servers, so that the new assets were immediately avaiable as the new code was deployed. Because of file hashing, it was possible to keep old versions of assets along with new versions. These files were served by Apache.

In 2013, staticfiles was used to serve assets in the development Vagrant environment instead of Apache, so that collectstatic was not needed to see changes. However, CSS files were converted to Stylus that year, which required compilation for development and deployment.

In 2015, several changes were made to prepare for the move from SCL3 to AWS. One change was to move assets from the /media folder, which is traditionally used for user uploads, to the /kuma/static folder. Another was adopting django-pipeline to compile assets, and WhiteNoise to serve them in production.

In 2017, MDN hosting moved from SCL3 to AWS. Apache was no longer used to serve assets, and WhiteNoise was used in production as well. This dropped the ability to serve old versions of assets, but a CDN with long caching times mitigated issues around deployments. That same year, the CSS sources were converted from Stylus to Sass.

In 2019, the development team decided to adopt new tools such as React and Webpack (ADR-004).