Skip to content

正则表达式

正则表达式(RegExp)是一种小型编程语言,用于在数据中查找模式。正则表达式可以用来检查不同数据类型中是否存在某种模式。在JavaScript中使用正则表达式,我们可以使用RegExp构造函数,或者使用两个正斜杠后跟一个标志来声明正则表达式模式。我们可以通过两种方式创建模式。

要声明字符串,我们使用单引号、双引号或反引号;要声明正则表达式,我们使用两个正斜杠和一个可选的标志。标志可以是g、i、m、s、u或y。

RegExp参数

正则表达式接受两个参数。一个必需的搜索模式和一个可选的标志。

模式

模式可以是文本或任何形式的具有某种相似性的模式。例如,电子邮件中的单词"spam"可能是我们有兴趣在电子邮件中查找的模式,或者电话号码格式可能是我们有兴趣查找的模式。

标志

标志是正则表达式中的可选参数,用于确定搜索类型。让我们看看一些标志:

  • g:全局标志,意味着在整个文本中查找模式
  • i:不区分大小写标志(搜索小写和大写)
  • m:多行

使用RegExp构造函数创建模式

声明没有全局标志和不区分大小写标志的正则表达式。

js
// 没有标志
let pattern = 'love'
let regEx = new RegExp(pattern)

声明带有全局标志和不区分大小写标志的正则表达式。

js
let pattern = 'love'
let flag = 'gi'
let regEx = new RegExp(pattern, flag)

使用RegExp对象声明正则表达式模式。在RegExp构造函数内写入模式和标志

js
let regEx = new RegExp('love','gi')

不使用RegExp构造函数创建模式

声明带有全局标志和不区分大小写标志的正则表达式。

js
let regEx= /love/gi

上面的正则表达式与我们使用RegExp构造函数创建的相同

js
let regEx= new RegExp('love','gi')

RegExp对象方法

让我们看看一些RegExp方法

测试匹配

test():测试字符串中的匹配。返回true或false。

js
const str = 'I love JavaScript'
const pattern = /love/
const result = pattern.test(str)
console.log(result)
sh
true

包含所有匹配的数组

match():返回包含所有匹配项的数组,包括捕获组,如果没有找到匹配项则返回null。 如果我们不使用全局标志,match()返回包含模式、索引、输入和组的数组。

js
const str = 'I love JavaScript'
const pattern = /love/
const result = str.match(pattern)
console.log(result)
sh
["love", index: 2, input: "I love JavaScript", groups: undefined]
js
const str = 'I love JavaScript'
const pattern = /love/g
const result = str.match(pattern)
console.log(result)
sh
["love"]

search():测试字符串中的匹配。返回匹配的索引,如果搜索失败则返回-1。

js
const str = 'I love JavaScript'
const pattern = /love/g
const result = str.search(pattern)
console.log(result)
sh
2

替换子字符串

replace():在字符串中执行匹配搜索,并用替换子字符串替换匹配的子字符串。

js
const txt = 'Python is the most beautiful language that a human begin has ever created.\
I recommend python for a first programming language'

matchReplaced = txt.replace(/Python|python/, 'JavaScript')
console.log(matchReplaced)
sh
JavaScript is the most beautiful language that a human begin has ever created.I recommend python for a first programming language
js
const txt = 'Python is the most beautiful language that a human begin has ever created.\
I recommend python for a first programming language'

matchReplaced = txt.replace(/Python|python/g, 'JavaScript')
console.log(matchReplaced)
sh
JavaScript is the most beautiful language that a human begin has ever created.I recommend JavaScript for a first programming language
js
const txt = 'Python is the most beautiful language that a human begin has ever created.\
I recommend python for a first programming language'

matchReplaced = txt.replace(/Python/gi, 'JavaScript')
console.log(matchReplaced)
sh
JavaScript is the most beautiful language that a human begin has ever created.I recommend JavaScript for a first programming language
js
const txt = '%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing.\
T%he%re i%s n%o%th%ing as m%ore r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing \
p%e%o%ple.\
I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs.\
D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher.'

matches = txt.replace(/%/g, '')
console.log(matches)
sh
I am teacher and  I love teaching.There is nothing as more rewarding as educating and empowering people.I found teaching more interesting than any other jobs.Does this motivate you to be a teacher.

正则表达式元字符

  • []: 一组字符
    • [a-c] 意味着 a 或 b 或 c
    • [a-z] 意味着任何字母 a 到 z
    • [A-Z] 意味着任何字符 A 到 Z
    • [0-3] 意味着 0 或 1 或 2 或 3
    • [0-9] 意味着任何数字 0 到 9
    • [A-Za-z0-9] 任何字符,包括 a 到 z、A 到 Z、0 到 9
  • \: 用于转义特殊字符
    • \d 意味着:匹配字符串包含数字的地方(数字0-9)
    • \D 意味着:匹配字符串不包含数字的地方
  • . : 除换行符(\n)外的任何字符
  • ^: 以...开始
    • r'^substring' 例如 r'^love',以单词love开始的句子
    • r'[^abc] 意味着不是a,不是b,不是c。
  • $: 以...结束
    • r'substring$' 例如 r'love$',以单词love结束的句子
  • *: 零次或多次
    • r'[a]*' 意味着a是可选的或可以出现多次。
  • +: 一次或多次
    • r'[a]+' 意味着至少一次或多次
  • ?: 零次或一次
    • r'[a]?' 意味着零次或一次
  • \b: 单词边界,匹配单词的开始或结束
  • {3}: 恰好3个字符
  • {3,}: 至少3个字符
  • {3,8}: 3到8个字符
  • |: 或者
    • r'apple|banana' 意味着苹果或香蕉
  • (): 捕获和分组

让我们用例子来阐明上述元字符

方括号

让我们使用方括号来包含小写和大写

js
const pattern = '[Aa]pple' // 这个方括号意味着A或a
const txt = 'Apple and banana are fruits. An old cliche says an apple a day keeps the  doctor way has been replaced by a banana a day keeps the doctor far far away. '
const matches = txt.match(pattern)

console.log(matches)
sh
["Apple", index: 0, input: "Apple and banana are fruits. An old cliche says an apple a day keeps the  doctor way has been replaced by a banana a day keeps the doctor far far away.", groups: undefined]
js
const pattern = /[Aa]pple/g // 这个方括号意味着A或a
const txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
const matches = txt.match(pattern)

console.log(matches)
sh
["Apple", "apple"]

如果我们想查找banana,我们写模式如下:

js
const pattern = /[Aa]pple|[Bb]anana/g // 这个方括号意味着A或a
const txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. Banana is easy to eat too.'
const matches = txt.match(pattern)

console.log(matches)
sh
["Apple", "banana", "apple", "banana", "Banana"]

使用方括号和或运算符,我们成功提取了Apple、apple、Banana和banana。

RegExp中的转义字符(\)

js
const pattern = /\d/g  // d是一个特殊字符,意味着数字
const txt = 'This regular expression example was made in January 12,  2020.'
const matches = txt.match(pattern)

console.log(matches)  // ["1", "2", "2", "0", "2", "0"],这不是我们想要的
js
const pattern = /\d+/g  // d是一个特殊字符,意味着数字
const txt = 'This regular expression example was made in January 12,  2020.'
const matches = txt.match(pattern)

console.log(matches)  // ["12", "2020"],这是我们想要的

一次或多次(+)

js
const pattern = /\d+/g  // d是一个特殊字符,意味着数字
const txt = 'This regular expression example was made in January 12,  2020.'
const matches = txt.match(pattern)
console.log(matches)  // ["12", "2020"]

句点(.)

js
const pattern = /[a]./g  // 这个方括号意味着a,.意味着除换行符外的任何字符
const txt = 'Apple and banana are fruits'
const matches = txt.match(pattern)

console.log(matches)  // ["an", "an", "an", "a ", "ar"]
js
const pattern = /[a].+/g  // . 任何字符,+ 任何字符一次或多次
const txt = 'Apple and banana are fruits'
const matches = txt.match(pattern)

console.log(matches)  // ['and banana are fruits']

零次或多次(*)

零次或多次。模式可能不出现或可能出现多次。

js
const pattern = /[a].*/g  //. 任何字符,* 任何字符零次或多次
const txt = 'Apple and banana are fruits'
const matches = txt.match(pattern)

console.log(matches)  // ['and banana are fruits']

零次或一次(?)

零次或一次。模式可能不出现或可能出现一次。

js
const txt = 'I am not sure if there is a convention how to write the word e-mail.\
Some people write it email others may write it as Email or E-mail.'
const pattern = /[Ee]-?mail/g  // ? 意味着可选
matches = txt.match(pattern)

console.log(matches)  // ["e-mail", "email", "Email", "E-mail"]

RegExp中的量词

我们可以使用花括号指定我们在文本中查找的子字符串的长度。让我们看看如何使用RegExp量词。假设我们对长度为4个字符的子字符串感兴趣

js
const txt = 'This regular expression example was made in December 6,  2019.'
const pattern = /\b\w{4}\b/g  //  恰好四个字符的单词
const matches = txt.match(pattern)
console.log(matches)  //['This', 'made', '2019']
js
const txt = 'This regular expression example was made in December 6,  2019.'
const pattern = /\b[a-zA-Z]{4}\b/g  //  恰好四个字符的单词,不包含数字
const matches = txt.match(pattern)
console.log(matches)  //['This', 'made']
js
const txt = 'This regular expression example was made in December 6,  2019.'
const pattern = /\d{4}/g  // 一个数字且恰好四位数
const matches = txt.match(pattern)
console.log(matches)  // ['2019']
js
const txt = 'This regular expression example was made in December 6,  2019.'
const pattern = /\d{1,4}/g   // 1到4位
const matches = txt.match(pattern)
console.log(matches)  // ['6', '2019']

插入符号 ^

  • 以...开始
js
const txt = 'This regular expression example was made in December 6,  2019.'
const pattern = /^This/ // ^ 意味着以...开始
const matches = txt.match(pattern)
console.log(matches)  // ['This']
  • 否定
js
const txt = 'This regular expression example was made in December 6,  2019.'
const pattern = /[^A-Za-z,. ]+/g  // ^ 在字符集中意味着否定,不是A到Z,不是a到z,没有空格,没有逗号,没有句点
const matches = txt.match(pattern)
console.log(matches)  // ["6", "2019"]

精确匹配

它应该有^开始和$结束。

js
let pattern = /^[A-Z][a-z]{3,12}$/;
let name = 'Asabeneh';
let result = pattern.test(name)

console.log(result) // true

🌕 你走得很远。继续前进!现在,你拥有了正则表达式的强大力量。你有能力提取和清理任何类型的文本,你可以从非结构化数据中获得意义。你刚刚完成了第12天的挑战,你在通往伟大的道路上前进了12步。现在为你的大脑和肌肉做一些练习。

💻 练习

练习:第1级

  1. 从以下文本中计算该人的年总收入。'He earns 4000 euro from salary per month, 10000 euro annual bonus, 5500 euro online courses per month.'

  2. 一些粒子在水平x轴上的位置:负方向的-12、-4、-3和-1,原点的0,正方向的4和8。提取这些数字并找到两个最远粒子之间的距离。

js
points = ['-1', '2', '-4', '-3', '-1', '0', '4', '8']
sortedPoints =  [-4, -3, -1, -1, 0, 2, 4, 8]
distance = 12
  1. 编写一个模式来识别字符串是否是有效的JavaScript变量

    sh
    is_valid_variable('first_name') # True
    is_valid_variable('first-name') # False
    is_valid_variable('1first_name') # False
    is_valid_variable('firstname') # True

练习:第2级

  1. 编写一个名为tenMostFrequentWords的函数,从字符串中获取十个最频繁的单词?

    js
        paragraph = `I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.`
        console.log(tenMostFrequentWords(paragraph))
    sh
        [
        {word:'love', count:6},
        {word:'you', count:5},
        {word:'can', count:3},
        {word:'what', count:2},
        {word:'teaching', count:2},
        {word:'not', count:2},
        {word:'else', count:2},
        {word:'do', count:2},
        {word:'I', count:2},
        {word:'which', count:1},
        {word:'to', count:1},
        {word:'the', count:1},
        {word:'something', count:1},
        {word:'if', count:1},
        {word:'give', count:1},
        {word:'develop',count:1},
        {word:'capabilities',count:1},
        {word:'application', count:1},
        {word:'an',count:1},
        {word:'all',count:1},
        {word:'Python',count:1},
        {word:'If',count:1}]
    js
    console.log(tenMostFrequentWords(paragraph, 10))
    sh
    [{word:'love', count:6},
    {word:'you', count:5},
    {word:'can', count:3},
    {word:'what', count:2},
    {word:'teaching', count:2},
    {word:'not', count:2},
    {word:'else', count:2},
    {word:'do', count:2},
    {word:'I', count:2},
    {word:'which', count:1}
    ]

练习:第3级

  1. 编写一个清理文本的函数。清理以下文本。清理后,计算字符串中三个最频繁的单词。
js
  sentence = `%I $am@% a %tea@cher%, &and& I lo%#ve %tea@ching%;. There $is nothing; &as& mo@re rewarding as educa@ting &and& @emp%o@wering peo@ple. ;I found tea@ching m%o@re interesting tha@n any other %jo@bs. %Do@es thi%s mo@tivate yo@u to be a tea@cher!?`
  console.log(cleanText(sentence))
sh
 I am a teacher and I love teaching There is nothing as more rewarding as educating and empowering people I found teaching more interesting than any other jobs Does this motivate you to be a teacher
 ```

2. 编写一个查找最频繁单词的函数。清理后,计算字符串中三个最频繁的单词。

```js
 console.log(mostFrequentWords(cleanedText))
 [{word:'I', count:3}, {word:'teaching', count:2}, {word:'teacher', count:2}]

🎉 恭喜!🎉