`String` object

This article explains the String object.

The explanation covers everything from basics to advanced techniques, including pitfalls related to Unicode and regular expressions, step by step and in an easy-to-understand manner.

YouTube Video

String object

Strings in JavaScript are one of the most frequently used types in everyday development.

Difference Between Primitive Strings and String Objects

Primitive strings (like "hello") behave differently than wrapper objects like new String("hello"). Normally, you should use primitives, and there is little need to use the object form.

1// Primitive string
2const a = "hello";
3
4// String wrapper object
5const b = new String("hello");
6
7console.log(typeof a); // "string"
8console.log(typeof b); // "object"
9console.log(a === b);  // false — wrapper objects are not strictly equal
  • This code shows the difference in type between a primitive and a wrapper, and how they behave with strict comparison. In most cases, avoid using new String() and stick to primitives.

Ways to Create Strings (Literals and Template Literals)

Template literals are useful for embedding variables and writing multi-line strings. You can insert variables and evaluate expressions intuitively.

1const name = "Alice";
2const age = 30;
3
4// Template literal
5const greeting = `Name: ${name}, Age: ${age + 1}`;
6
7console.log(greeting); // "Name: Alice, Age: 31"
  • Template literals are highly readable and ideal for building complex strings, including multi-line strings.

Common Methods (Search and Substring Extraction)

The String object has many basic methods.

 1const text = "Hello, world! Hello again.";
 2
 3// search
 4console.log(text.indexOf("Hello"));       // 0
 5console.log(text.indexOf("Hello", 1));    // 13
 6console.log(text.includes("world"));      // true
 7console.log(text.startsWith("Hello"));    // true
 8console.log(text.endsWith("again."));     // true
 9
10// slice / substring
11console.log(text.slice(7, 12));           // "world"
12console.log(text.substring(7, 12));       // "world"
  • slice and substring are similar, but they handle negative indexes differently. slice interprets negative values as positions from the end. Be clear about which one to use.

Splitting and Joining (split / join)

It is common to split a string into an array for processing and then join it back.

1const csv = "red,green,blue";
2const arr = csv.split(","); // ["red","green","blue"]
3
4console.log(arr);
5console.log(arr.join(" | ")); // "red | green | blue"
  • A common pattern is to use split to divide a string, process the resulting array with map or filter, and then use join to combine it back.

Replace and Regular Expressions

replace only replaces the first match. If you want to replace all matches, use the g flag with a regular expression. By passing a function as the replacement, you can perform dynamic replacements as well.

 1const s = "foo 1 foo 2";
 2
 3// replace first only
 4console.log(s.replace("foo", "bar")); // "bar 1 foo 2"
 5
 6// replace all using regex
 7console.log(s.replace(/foo/g, "bar")); // "bar 1 bar 2"
 8
 9// replace with function
10const r = s.replace(/\d+/g, (match) => String(Number(match) * 10));
11console.log(r);    // "foo 10 foo 20"
  • With dynamic replacement using a function, you can concisely write code that analyzes and transforms matches.

Case Conversion and Normalization

For multilingual support and comparison, in addition to toLowerCase and toUpperCase, Unicode normalization (normalize) is also important. This is especially necessary when comparing accented characters.

 1// Case conversion example:
 2// "\u00DF" represents the German letter "ß".
 3// In some locales, converting "ß" to uppercase becomes "SS".
 4// JavaScript follows this behavior.
 5console.log("\u00DF");
 6console.log("\u00DF".toUpperCase()); // "SS"
 7
 8// Unicode normalization example:
 9// "e\u0301" is "e" + a combining acute accent.
10// "\u00e9" is the precomposed character "é".
11// These two look the same but are different code point sequences.
12const a = "e\u0301";
13const b = "\u00e9";
14
15console.log(a === b);   // false: different underlying code points
16console.log(a.normalize() === b.normalize()); // true: normalized to the same form
  • Different Unicode representations, such as ligatures and combining characters, won’t be equal as-is, so use normalize() before comparing.

Unicode and Code Points (Handling Surrogate Pairs)

JavaScript strings are sequences of UTF-16 code units, so some characters like emojis can occupy two code units for a single character. To handle real character units, use Array.from, the spread operator, or for...of.

 1// Emoji composed with multiple code points:
 2// "\u{1F469}" = woman, "\u{200D}" = Zero Width Joiner (ZWJ),
 3// "\u{1F4BB}" = laptop. Combined, they form a single emoji: 👩‍💻
 4const s = "\u{1F469}\u{200D}\u{1F4BB}";
 5console.log(s);
 6
 7// Length in UTF-16 code units (not actual Unicode characters):
 8// Because this emoji uses surrogate pairs + ZWJ, the length may be > 1.
 9console.log("Length:", s.length);
10
11// Iterate by Unicode code points (ES6 for...of iterates code points):
12// Each iteration gives a full Unicode character, not UTF-16 units.
13for (const ch of s) {
14  console.log(ch);
15}
16
17// Convert to an array of Unicode characters:
18console.log(Array.from(s));
  • length returns the number of code units, so you may not get the expected count with emojis or ligatures. for...of and Array.from handle something close to displayed characters (grapheme clusters), but if you need complete grapheme support, consider using a specialized library.

Safe Regular Expression Replacement (When Handling User Input)

If you forget to escape user input when embedding it into a regular expression, it can lead to unexpected behavior and vulnerabilities. Always escape user input before using it in a pattern.

1function escapeRegex(s) {
2  return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
3}
4
5const userInput = "a+b";
6const pattern = new RegExp(escapeRegex(userInput), "g");
7console.log("a+b a+b".replace(pattern, "X")); // "X X"
  • Do not use user strings directly in regular expressions; always escape them before constructing the regex.

Performance Tips: Concatenation and Templates

When concatenating many small strings in sequence, putting them into an array and using join can be more efficient. On the other hand, template strings are highly readable and fast enough in most cases.

 1// concatenation in loop (less ideal)
 2let s = "";
 3for (let i = 0; i < 1000; i++) {
 4  s += i + ",";
 5}
 6
 7// using array + join (often faster for many pieces)
 8const parts = [];
 9for (let i = 0; i < 1000; i++) {
10  parts.push(i + ",");
11}
12const s2 = parts.join("");
  • Modern JavaScript engines are highly optimized, so you don’t need to worry about performance for a small number of concatenations. However, if you need to concatenate tens of thousands of times, using join can be more efficient.

Useful Practical Techniques: Padding, Trim, and Repeat

trim, padStart, padEnd, and repeat are convenient methods that are especially useful in everyday string processing. They are often used in practical situations such as formatting input values or standardizing output formats.

1console.log("  hello  ".trim());       // "hello"
2console.log("5".padStart(3, "0"));     // "005"
3console.log("x".repeat(5));            // "xxxxx"
  • These methods can be used for normalizing form input or generating fixed-width output.

String Comparison (Locale Comparison)

localeCompare is effective for comparing strings according to dictionary order for different languages. You can specify language and sensitivity options (such as case sensitivity).

1console.log(
2  "\u00E4".localeCompare("z", "de")
3); // may be -1 or other depending on locale
4
5console.log(
6  "a".localeCompare("A", undefined, { sensitivity: "base" })
7); // 0
  • For internationalized comparisons, use localeCompare and specify the appropriate locale and options.

Practical Example: Converting a CSV Row to an Object (Practical Workflow)

A common use case is parsing a single CSV row into an object using a combination of split, trim, and map. For quoted fields or complex CSV files, use a dedicated CSV parser.

 1// simple CSV parse (no quotes handling)
 2function parseCsvLine(line, headers) {
 3  const values = line.split(",").map(v => v.trim());
 4  const obj = {};
 5  headers.forEach((h, i) => obj[h] = values[i] ?? null);
 6  return obj;
 7}
 8
 9const headers = ["name", "age", "city"];
10const line = " Alice , 30 , New York ";
11console.log(parseCsvLine(line, headers));
12// { name: "Alice", age: "30", city: "New York" }
  • This method works for simple CSV, but be aware that it cannot handle cases where a comma is inside a quoted field.

Common Pitfalls

There are some easily overlooked specifications and behaviors in JavaScript string handling. To avoid unexpected bugs, it's important to keep the following points in mind.

  • Using new String() can lead to incorrect results with type checking or comparisons. In most cases, primitive string types are sufficient.
  • In Unicode, a single visible character may consist of multiple code units. Therefore, the value returned by length may not match the actual number of displayed characters.
  • When incorporating user input into a regular expression, always escape it first.
  • String.prototype.replace() replaces only the first match by default. If you want to replace all occurrences, use the /g flag in your regular expression.
  • Strings are immutable, so operations always return a new string. It is important to always assign the returned value.

Summary

Even though JavaScript strings may seem simple, it’s important to understand their characteristics regarding Unicode and immutability. By mastering the basics, you can greatly improve the reliability and readability of your string processing.

You can follow along with the above article using Visual Studio Code on our YouTube channel. Please also check out the YouTube channel.

YouTube Video