浅谈TreeShaking在前端中的应用 - Rollup VS Webpack
Tree Shaking是一种死码清除(dead code elimination)技术,通常用于在ECMAScript方言比如Dart,JavaScript或者TypeScript打包成一个文件时,移除未使用的代码以此来优化代码。
在动态语言中实现 Dead code elimination 要比在静态语言中实现难的多。treeshaker 这个概念最早在起源于1990年,来自LISP语言。这个技术主要核心点是,将程序中所有可能执行到的流用一个树形结构的函数调用来表示,这样那些从来不会调用的函数就可以清理
在2012年的时候,Google 的 Bob Nystrom 开发的 closure compiler 就实现了这个算法,并应用在 Dart 的 dart2js compiler中。在Dart代码编译成JavaScript的过程中,编译器会做tree shaking。在JavaScript中,有时候就算你只使用了库中的一个函数,你也不得不将整个库引入到项目中,最后输出的文件会包含很多用不上的代码,体积大很多。而tree shaking技术就可以让输出的代码只包含你需要的函数。
2015年的时候 Rollup推出了 Tree Shaking 功能,将这个概念真正带到了前端圈子中,随后 Webpack2 跟进也实现了 Tree Shaking,并在Webpack4中升级改进了方案
TreeShaking 的基本原理
在编译器原理中,死码消除(Dead code elimination)是一种编译最佳化(Optimizing compiler)技术,它的用途是移除对程序执行结果没有任何影响的代码。移除这类代码可以减少程序的大小,避免执行过程中出现程序出现不相关的运算行为。无法执行的代码(unreachable code), 执行结果不会被使用的代码和只会影响死变量(只写不读)的代码都属于 Dead code。
对于JavaScript这种动态语言来说,TreeShaking的实现原本是一件相对较困难的事情。为何从Rollup开始能把这种技术带去前端圈子呢?因为ES6的模块特性。较早提出的CommonJS, AMD等规范都是动态的,模块的导入导出可以在运行时动态的变化。同时对于模块来说都是一个个对象,无论他们导出什么,都可以通过属性的方式来访问,比如:
const a = require(./${file}.js/) // 可以动态加载模块
const { stat, exists, readFile } = require('fs') // 解构导出的对象
var my_lib;
if (Math.random()) {
my_lib = require('foo');} else {
my_lib = require('bar');}
if (Math.random()) {
exports.baz = ···;}
而ES6中的模块特性不同,它实现静态模块结构。在编译时就要确定导入和导出的内容,不允许在运行时发生变化。而正是这静态类型的特性,与 Treeshaking 无比契合。传统编译型的语言中,都是由编译器将 Dead Code 从 AST(抽象语法树)中删除。而在JavaScript中使用 Rollup 和 Webpack 都可以完成了这个任务。
Rollup
在 Rollup 中默认是启用treeshaking的。配置项中的 treeshake 默认为 true。包含其他一些选项,可按需配置。
// src/rollup/typed.d.ts
export interface NormalizedInputOptions {
  acorn: Object;
  acornInjectPlugins: Function[];
  cache: false | undefined | RollupCache;
  context: string;
  experimentalCacheExpiry: number;
  external: IsExternal;
  /* @deprecated Use the "inlineDynamicImports" output option instead. /
  inlineDynamicImports: boolean | undefined;
  input: string[] | { [entryAlias: string]: string };
  /* @deprecated Use the "manualChunks" output option instead. /
  manualChunks: ManualChunksOption | undefined;
  moduleContext: (id: string) => string;
  onwarn: WarningHandler;
  perf: boolean;
  plugins: Plugin[];
  preserveEntrySignatures: PreserveEntrySignaturesOption;
  /* @deprecated Use the "preserveModules" output option instead. /
  preserveModules: boolean | undefined;
  preserveSymlinks: boolean;
  shimMissingExports: boolean;
  strictDeprecations: boolean;
  treeshake: false | NormalizedTreeshakingOptions;
}
export interface NormalizedTreeshakingOptions {
  annotations: boolean;
  moduleSideEffects: HasModuleSideEffects;
  propertyReadSideEffects: boolean;
  tryCatchDeoptimization: boolean;
  unknownGlobalSideEffects: boolean;
}
tresshake这个参数主要影响两个地方:
编译启动阶段Graph执行build方法时,过滤掉相应的Module,为剩余的Module创建AST的上下文
编译过程会将Module中getDependenciesToBeIncluded方法返回的模块用作后续的chunk
在Rollup的源码:src/Graph.ts中有一个includeStatements方法。
src/Graph.ts
export default class Graph {
  ...
  async build(): Promise
timeStart('generate module graph', 2);
await this.generateModuleGraph();
timeEnd('generate module graph', 2);
timeStart('sort modules', 2);
this.phase = BuildPhase.ANALYSE;
this.sortModules();
timeEnd('sort modules', 2);
timeStart('mark included statements', 2);
this.includeStatements();
timeEnd('mark included statements', 2);
this.phase = BuildPhase.GENERATE;}
  ...
  private includeStatements() {
for (const module of [...this.entryModules, ...this.implicitEntryModules]) {
  if (module.preserveSignature !== false) {
    module.includeAllExports();
  } else {
    markModuleAndImpureDependenciesAsExecuted(module);
  }
}
if (this.options.treeshake) {
  let treeshakingPass = 1;
  do {
    timeStart(`treeshaking pass ${treeshakingPass}`, 3);
    this.needsTreeshakingPass = false;
    for (const module of this.modules) {
      if (module.isExecuted) module.include();
    }
    timeEnd(`treeshaking pass ${treeshakingPass++}`, 3);
  } while (this.needsTreeshakingPass);
} else {
  for (const module of this.modules) module.includeAllInBundle();
}
for (const externalModule of this.externalModules) externalModule.warnUnusedImports();
for (const module of this.implicitEntryModules) {
  for (const dependant of module.implicitlyLoadedAfter) {
    if (!(dependant.isEntryPoint || dependant.isIncluded())) {
      error(errImplicitDependantIsNotIncluded(dependant));
    }
  }
}}
  ...
}
在 if 代码块中的 module.include() 和 module.includeAllInBundle() 做的事情很简单,就是
export default class Module {
  ...
  includeAllInBundle() {
  this.ast.include(createInclusionContext(), true);
  }
  ...
  include(): void {
  const context = createInclusionContext();
  if (this.ast.shouldBeIncluded(context)) this.ast.include(context, false);
  }
}
在Rollup的源码:src/Module.ts中有一个getDependenciesToBeIncluded方法,这个方法返回最后code split 时需要使用的 module。
src/Module.ts
export default class Module {
  ...
  getDependenciesToBeIncluded(): Set
if (this.relevantDependencies) return this.relevantDependencies;
const relevantDependencies = new Set<Module | ExternalModule>();
const additionalSideEffectModules = new Set<Module>();
const possibleDependencies = new Set(this.dependencies);
let dependencyVariables = this.imports;
if (this.isEntryPoint || this.includedDynamicImporters.length > 0 || this.namespace.included) {
  dependencyVariables = new Set(dependencyVariables);
  for (const exportName of [...this.getReexports(), ...this.getExports()]) {
    dependencyVariables.add(this.getVariableForExportName(exportName));
  }
}
for (let variable of dependencyVariables) {
  if (variable instanceof SyntheticNamedExportVariable) {
    variable = variable.getBaseVariable();
  } else if (variable instanceof ExportDefaultVariable) {
    const { modules, original } = variable.getOriginalVariableAndDeclarationModules();
    variable = original;
    for (const module of modules) {
      additionalSideEffectModules.add(module);
      possibleDependencies.add(module);
    }
  }
  relevantDependencies.add(variable.module!);
}
if (this.options.treeshake) {
  for (const dependency of possibleDependencies) {
    if (
      !(
        dependency.moduleSideEffects || additionalSideEffectModules.has(dependency as Module)
      ) ||
      relevantDependencies.has(dependency)
    ) {
      continue;
    }
    if (dependency instanceof ExternalModule || dependency.hasEffects()) {
      relevantDependencies.add(dependency);
    } else {
      for (const transitiveDependency of dependency.dependencies) {
        possibleDependencies.add(transitiveDependency);
      }
    }
  }
} else {
  for (const dependency of this.dependencies) {
    relevantDependencies.add(dependency);
  }
}
return (this.relevantDependencies = relevantDependencies);}
}
Webpack
如果在Webpack想要对代码进行 tree-shaking,需要满足以下几项:
你必须处于生产模式。Webpack 只有在压缩代码的时候会 tree-shaking
必须将优化选项 usedExports 设置为 true。告诉 Webpack 识别出它认为没有被使用的代码,并在最初的打包步骤中给它做标记。
最后使用一个支持删除死代码的压缩器。这种压缩器将识别出 Webpack 是如何标记它认为没有被使用的代码,并将其剥离。TerserPlugin 支持这个功能
下面是 Webpack 开启 tree-shaking 的基本配置:
// Base Webpack Config for Tree Shaking
const config = {
  mode: 'production',
  optimization: {
usedExports: true,
minimizer: [
  new TerserPlugin({...})
]}
};
以webpack官网文档中的demo为例,当开始生产环境模式时,打包输出的内容会包含unused harmony export,以此来标记没有被使用的代码
src/index.js
import _ from 'lodash';
import { cube } from './math.js';
function component() {
  const element = document.createElement('div');
  const element = document.createElement('pre');
// Lodash, now imported by this script
  element.innerHTML = _.join(['Hello', 'webpack'], ' ');
  element.innerHTML = [
'Hello webpack!',
'5 cubed is equal to ' + cube(5)].join('\n\n');
return element;
}
document.body.appendChild(component());
dist/bundle.js
/ 1 /
/*/ (function(module, __webpack_exports__, __webpack_require__) {
  'use strict';
  / unused harmony export square /
  / harmony export (immutable) / __webpack_exports__['a'] = cube;
  function square(x) {
return x * x;}
function cube(x) {
return x * x * x;}
});
然后再通过 Terser-Webpack-Plugin 压缩代码,将标记为未使用的代码删除。接下来我们看看这两个过程具体的代码实现。
在源码的 lib/optimize.js中,会先定义一个Set,保存所有暴露出来的未使用的exports
// Set with all root exposed unused exports
/* @type {Set
const unusedExports = new Set();
接下来遍历保存在rootModule中的所有HarmonyExportSpecifierDependency依赖,将没有使用的依赖名称保存到 unusedExports
for (const dep of this.rootModule.dependencies) {
  if (dep instanceof HarmonyExportSpecifierDependency) {
const used = /** @type {string | false } */ (this.rootModule.getUsedName(
  moduleGraph,
  dep.name
));
if (used) {
  const info = moduleToInfoMap.get(this.rootModule);
  if (!exportsMap.has(used)) {
    exportsMap.set(
      used,
      () => `/* binding */ ${info.internalNames.get(dep.id)}`
    );
  }
} else {
  unusedExports.add(dep.name || "namespace");
}} else if (dep instanceof HarmonyExportExpressionDependency) {
const used = /** @type {string | false } */ (this.rootModule.getUsedName(
  moduleGraph,
  "default"
));
if (used) {
  const info = moduleToInfoMap.get(this.rootModule);
  if (!exportsMap.has(used)) {
    exportsMap.set(
      used,
      () =>
        `/* default */ ${info.internalNames.get(
          typeof dep.declarationId === "string"
            ? dep.declarationId
            : "__WEBPACK_MODULE_DEFAULT_EXPORT__"
        )}`
    );
  }
} else {
  unusedExports.add("default");
}} else if (dep instanceof HarmonyExportImportedSpecifierDependency) {
const exportDefs = getHarmonyExportImportedSpecifierDependencyExports(
  dep,
  moduleGraph
);
for (const def of exportDefs) {
  const importedModule = moduleGraph.getModule(dep);
  const info = moduleToInfoMap.get(importedModule);
  const used = /** @type {string | false } */ (this.rootModule.getUsedName(
    moduleGraph,
    def.name
  ));
  if (used) {
    if (!exportsMap.has(used)) {
      exportsMap.set(used, requestShortener => {
        const finalName = getFinalName(
          moduleGraph,
          info,
          def.ids,
          moduleToInfoMap,
          requestShortener,
          runtimeTemplate,
          false,
          false,
          this.rootModule.buildMeta.strictHarmonyModule,
          true
        );
        return `/* reexport */ ${finalName}`;
      });
    }
  } else {
    unusedExports.add(def.name);
  }
}}
}
在 lib/dependencies/HarmonyExportInitFragment.js 和 lib/dependencies/HarmonyExportExpressionDependency.js 都有打标记的操作
lib/dependencies/HarmonyExportInitFragment.js
/**
- @param {GenerateContext} generateContext context for generate
- @returns {string|Source} the source code that will be included as initialization code
 */
getContent({ runtimeTemplate, runtimeRequirements }) {
  runtimeRequirements.add(RuntimeGlobals.exports);
  runtimeRequirements.add(RuntimeGlobals.definePropertyGetters);
const unusedPart =
this.unusedExports.size > 1
  ? `/* unused harmony exports ${joinIterableWithComma(
    this.unusedExports
  )} */\n`
  : this.unusedExports.size > 0
    ? `/* unused harmony export ${
    this.unusedExports.values().next().value
    } */\n`
    : "";const definitions = [];
  for (const [key, value] of this.exportMap) {
definitions.push(
  `\n/* harmony export */   ${JSON.stringify(
    key
  )}: ${runtimeTemplate.returningFunction(value)}`
);}
  const definePart =
this.exportMap.size > 0
  ? `/* harmony export */ ${RuntimeGlobals.definePropertyGetters}(${
  this.exportsArgument
  }, {${definitions.join(",")}\n/* harmony export */ });\n`
  : "";return ${definePart}${unusedPart};
}
lib/dependencies/HarmonyExportExpressionDependency.js
HarmonyExportExpressionDependency.Template = class HarmonyExportDependencyTemplate extends NullDependency.Template {
  /**
- @param {Dependency} dependency the dependency for which the template should be applied
- @param {ReplaceSource} source the current replace source which can be modified
- @param {DependencyTemplateContext} templateContext the context object
- @returns {void}
 */
apply(
dependency,
source,
{ module, moduleGraph, runtimeTemplate, runtimeRequirements, initFragments }) {
const dep = /** @type {HarmonyExportExpressionDependency} */ (dependency);
const used = module.getUsedName(moduleGraph, "default");
const { declarationId } = dep;
const exportsName = module.exportsArgument;
if (declarationId) {
  let name;
  if (typeof declarationId === "string") {
    name = declarationId;
  } else {
    name = "__WEBPACK_DEFAULT_EXPORT__";
    source.replace(
      declarationId.range[0],
      declarationId.range[1] - 1,
      `${declarationId.prefix}${name}${declarationId.suffix}`
    );
  }
  if (used) {
    const map = new Map();
    map.set(used, `/* export default binding */ ${name}`);
    initFragments.push(new HarmonyExportInitFragment(exportsName, map));
  }
  source.replace(
    dep.rangeStatement[0],
    dep.range[0] - 1,
    `/* harmony default export */ ${dep.prefix}`
  );
} else {
  let content;
  if (used) {
    runtimeRequirements.add(RuntimeGlobals.exports);
    if (runtimeTemplate.supportsConst()) {
      const name = "__WEBPACK_DEFAULT_EXPORT__";
      content = `/* harmony default export */ const ${name} = `;
      const map = new Map();
      map.set(used, name);
      initFragments.push(new HarmonyExportInitFragment(exportsName, map));
    } else {
      // This is a little bit incorrect as TDZ is not correct, but we can't use const.
      content = `/* harmony default export */ ${exportsName}[${JSON.stringify(
        used
      )}] = `;
    }
  } else {
    content =
      "/* unused harmony default export */ var _unused_webpack_default_export = ";
  }
  if (dep.range) {
    source.replace(
      dep.rangeStatement[0],
      dep.range[0] - 1,
      content + "(" + dep.prefix
    );
    source.replace(dep.range[1], dep.rangeStatement[1] - 0.5, ");");
    return;
  }
  source.replace(dep.rangeStatement[0], dep.rangeStatement[1] - 1, content);
}}
};
Terser-Webpack-plugin 和 Terser 还未找到相关剔除代码的标记
总结
虽然还没有完全理清楚Webpack中treeshaking在代码层面的具体逻辑,但是对比可以看到。Rollup采用的是先分析,然后找到需要的代码,最后再打包。而webpack则是先打标记,最后再剔除,比较符合标准的DCE的操作