regenerate

Generate JavaScript-compatible regular expressions based on a given set of Unicode symbols or code points.

Regenerate Build status Code coverage status Dependency status

Regenerate is a Unicode-aware regex generator for JavaScript. It allows you to easily generate ES5-compatible regular expressions based on a given set of Unicode symbols or code points. (This is trickier than you might think, because of how JavaScript deals with astral symbols.)

Installation

Via npm:

npm install regenerate

Via Bower:

bower install regenerate

Via Component:

component install mathiasbynens/regenerate

In a browser:

<script src="regenerate.js"></script>

In Node.js, io.js, and RingoJS ≥ v0.8.0:

var regenerate = require('regenerate');

In Narwhal and RingoJS ≤ v0.7.0:

var regenerate = require('regenerate').regenerate;

In Rhino:

load('regenerate.js');

Using an AMD loader like RequireJS:

require(
  {
    'paths': {
      'regenerate': 'path/to/regenerate'
    }
  },
  ['regenerate'],
  function(regenerate) {
    console.log(regenerate);
  }
);

API

regenerate(value1, value2, value3, ...)

The main Regenerate function. Calling this function creates a new set that gets a chainable API.

var set = regenerate()
  .addRange(0x60, 0x69) // add U+0060 to U+0069
  .remove(0x62, 0x64) // remove U+0062 and U+0064
  .add(0x1D306); // add U+1D306
set.valueOf();
// → [0x60, 0x61, 0x63, 0x65, 0x66, 0x67, 0x68, 0x69, 0x1D306]
set.toString();
// → '[`ace-i]|\\uD834\\uDF06'
set.toRegExp();
// → /[`ace-i]|\uD834\uDF06/

Any arguments passed to regenerate() will be added to the set right away. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.

regenerate(0x1D306, 'A', '©', 0x2603).toString();
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'

var items = [0x1D306, 'A', '©', 0x2603];
regenerate(items).toString();
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'

regenerate.prototype.add(value1, value2, value3, ...)

Any arguments passed to add() are added to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.

regenerate().add(0x1D306, 'A', '©', 0x2603).toString();
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'

var items = [0x1D306, 'A', '©', 0x2603];
regenerate().add(items).toString();
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'

It’s also possible to pass in a Regenerate instance. Doing so adds all code points in that instance to the current set.

var set = regenerate(0x1D306, 'A');
regenerate().add('©', 0x2603).add(set).toString();
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'

Note that the initial call to regenerate() acts like add(). This allows you to create a new Regenerate instance and add some code points to it in one go:

regenerate(0x1D306, 'A', '©', 0x2603).toString();
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'

regenerate.prototype.remove(value1, value2, value3, ...)

Any arguments passed to remove() are removed to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.

regenerate(0x1D306, 'A', '©', 0x2603).remove('☃').toString();
// → '[A\\xA9]|\\uD834\\uDF06'

It’s also possible to pass in a Regenerate instance. Doing so removes all code points in that instance from the current set.

var set = regenerate('☃');
regenerate(0x1D306, 'A', '©', 0x2603).remove(set).toString();
// → '[A\\xA9]|\\uD834\\uDF06'

regenerate.prototype.addRange(start, end)

Adds a range of code points from start to end (inclusive) to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.

regenerate(0x1D306).addRange(0x00, 0xFF).toString(16);
// → '[\\0-\\xFF]|\\uD834\\uDF06'

regenerate().addRange('A', 'z').toString();
// → '[A-z]'

regenerate.prototype.removeRange(start, end)

Removes a range of code points from start to end (inclusive) from the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.

regenerate()
  .addRange(0x000000, 0x10FFFF) // add all Unicode code points
  .removeRange('A', 'z') // remove all symbols from `A` to `z`
  .toString();
// → '[\\[email protected]\\{-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'

regenerate()
  .addRange(0x000000, 0x10FFFF) // add all Unicode code points
  .removeRange(0x0041, 0x007A) // remove all code points from U+0041 to U+007A
  .toString();
// → '[\\[email protected]\\{-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'

regenerate.prototype.intersection(codePoints)

Removes any code points from the set that are not present in both the set and the given codePoints array. codePoints must be an array of numeric code point values, i.e. numbers.

regenerate()
  .addRange(0x00, 0xFF) // add extended ASCII code points
  .intersection([0x61, 0x69]) // remove all code points from the set except for these
  .toString();
// → '[ai]'

Instead of the codePoints array, it’s also possible to pass in a Regenerate instance.

var whitelist = regenerate(0x61, 0x69);

regenerate()
  .addRange(0x00, 0xFF) // add extended ASCII code points
  .intersection(whitelist) // remove all code points from the set except for those in the `whitelist` set
  .toString();
// → '[ai]'

regenerate.prototype.contains(value)

Returns true if the given value is part of the set, and false otherwise. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.

var set = regenerate().addRange(0x00, 0xFF);
set.contains('A');
// → true
set.contains(0x1D306);
// → false

regenerate.prototype.clone()

Returns a clone of the current code point set. Any actions performed on the clone won’t mutate the original set.

var setA = regenerate(0x1D306);
var setB = setA.clone().add(0x1F4A9);
setA.toArray();
// → [0x1D306]
setB.toArray();
// → [0x1D306, 0x1F4A9]

regenerate.prototype.toString(options)

Returns a string representing (part of) a regular expression that matches all the symbols mapped to the code points within the set.

regenerate(0x1D306, 0x1F4A9).toString();
// → '\\uD834\\uDF06|\\uD83D\\uDCA9'

If the bmpOnly property of the optional options object is set to true, the output matches surrogates individually, regardless of whether they’re lone surrogates or just part of a surrogate pair. This simplifies the output, but it can only be used in case you’re certain the strings it will be used on don’t contain any astral symbols.

var highSurrogates = regenerate().addRange(0xD800, 0xDBFF);
highSurrogates.toString();
// → '[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])'
highSurrogates.toString({ 'bmpOnly': true });
// → '[\\uD800-\\uDBFF]'

var lowSurrogates = regenerate().addRange(0xDC00, 0xDFFF);
lowSurrogates.toString();
// → '(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'
lowSurrogates.toString({ 'bmpOnly': true });
// → '[\\uDC00-\\uDFFF]'

Note that lone low surrogates cannot be matched accurately using regular expressions in JavaScript. Regenerate’s output makes a best-effort approach but there can be false negatives in this regard.

If the hasUnicodeFlag property of the optional options object is set to true, the output makes use of Unicode code point escapes (\u{…}) where applicable. This simplifies the output at the cost of compatibility and portability, since it means the output can only be used as a pattern in a regular expression with the ES6 u flag enabled.

var set = regenerate().addRange(0x0, 0x10FFFF);

set.toString();
// → '[\\0-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]''

set.toString({ 'hasUnicodeFlag': true });
// → '[\\0-\\u{10FFFF}]'

regenerate.prototype.toRegExp(flags = '')

Returns a regular expression that matches all the symbols mapped to the code points within the set. Optionally, you can pass flags to be added to the regular expression.


var regex = regenerate(0x1D306, 0x1F4A9).toRegExp();
// → /\uD834\uDF06|\uD83D\uDCA9/
regex.test('

Related Repositories

code-migration

code-migration

Magento 1 to Magento 2 code migration tool ...

orgmk

orgmk

Automate export (PDF, HTML, etc.) of Org documents ...

WP-OTF-Regenerate-Thumbnails

WP-OTF-Regenerate-Thumbnails

Automatically regenerates your thumbnails on the fly when thumbnail sizes change ...

regenerate-thumbnails

regenerate-thumbnails

WordPress plugin for regenerating thumbnails of uploaded images. 1 million users ...

dartdocs.org

dartdocs.org

Generates the Dart documentation for all the packages in pub ...