主题
正则表达式
正则表达式(RegExp)是一种小型编程语言,用于在数据中查找模式。正则表达式可以用来检查不同数据类型中是否存在某种模式。在JavaScript中使用正则表达式,我们可以使用RegExp构造函数,或者使用两个正斜杠后跟一个标志来声明正则表达式模式。我们可以通过两种方式创建模式。
要声明字符串,我们使用单引号、双引号或反引号;要声明正则表达式,我们使用两个正斜杠和一个可选的标志。标志可以是g、i、m、s、u或y。
RegExp参数
正则表达式接受两个参数。一个必需的搜索模式和一个可选的标志。
模式
模式可以是文本或任何形式的具有某种相似性的模式。例如,电子邮件中的单词"spam"可能是我们有兴趣在电子邮件中查找的模式,或者电话号码格式可能是我们有兴趣查找的模式。
标志
标志是正则表达式中的可选参数,用于确定搜索类型。让我们看看一些标志:
- g:全局标志,意味着在整个文本中查找模式
- i:不区分大小写标志(搜索小写和大写)
- m:多行
使用RegExp构造函数创建模式
声明没有全局标志和不区分大小写标志的正则表达式。
js
// 没有标志
let pattern = 'love'
let regEx = new RegExp(pattern)
声明带有全局标志和不区分大小写标志的正则表达式。
js
let pattern = 'love'
let flag = 'gi'
let regEx = new RegExp(pattern, flag)
使用RegExp对象声明正则表达式模式。在RegExp构造函数内写入模式和标志
js
let regEx = new RegExp('love','gi')
不使用RegExp构造函数创建模式
声明带有全局标志和不区分大小写标志的正则表达式。
js
let regEx= /love/gi
上面的正则表达式与我们使用RegExp构造函数创建的相同
js
let regEx= new RegExp('love','gi')
RegExp对象方法
让我们看看一些RegExp方法
测试匹配
test():测试字符串中的匹配。返回true或false。
js
const str = 'I love JavaScript'
const pattern = /love/
const result = pattern.test(str)
console.log(result)
sh
true
包含所有匹配的数组
match():返回包含所有匹配项的数组,包括捕获组,如果没有找到匹配项则返回null。 如果我们不使用全局标志,match()返回包含模式、索引、输入和组的数组。
js
const str = 'I love JavaScript'
const pattern = /love/
const result = str.match(pattern)
console.log(result)
sh
["love", index: 2, input: "I love JavaScript", groups: undefined]
js
const str = 'I love JavaScript'
const pattern = /love/g
const result = str.match(pattern)
console.log(result)
sh
["love"]
search():测试字符串中的匹配。返回匹配的索引,如果搜索失败则返回-1。
js
const str = 'I love JavaScript'
const pattern = /love/g
const result = str.search(pattern)
console.log(result)
sh
2
替换子字符串
replace():在字符串中执行匹配搜索,并用替换子字符串替换匹配的子字符串。
js
const txt = 'Python is the most beautiful language that a human begin has ever created.\
I recommend python for a first programming language'
matchReplaced = txt.replace(/Python|python/, 'JavaScript')
console.log(matchReplaced)
sh
JavaScript is the most beautiful language that a human begin has ever created.I recommend python for a first programming language
js
const txt = 'Python is the most beautiful language that a human begin has ever created.\
I recommend python for a first programming language'
matchReplaced = txt.replace(/Python|python/g, 'JavaScript')
console.log(matchReplaced)
sh
JavaScript is the most beautiful language that a human begin has ever created.I recommend JavaScript for a first programming language
js
const txt = 'Python is the most beautiful language that a human begin has ever created.\
I recommend python for a first programming language'
matchReplaced = txt.replace(/Python/gi, 'JavaScript')
console.log(matchReplaced)
sh
JavaScript is the most beautiful language that a human begin has ever created.I recommend JavaScript for a first programming language
js
const txt = '%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing.\
T%he%re i%s n%o%th%ing as m%ore r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing \
p%e%o%ple.\
I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs.\
D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher.'
matches = txt.replace(/%/g, '')
console.log(matches)
sh
I am teacher and I love teaching.There is nothing as more rewarding as educating and empowering people.I found teaching more interesting than any other jobs.Does this motivate you to be a teacher.
正则表达式元字符
- []: 一组字符
- [a-c] 意味着 a 或 b 或 c
- [a-z] 意味着任何字母 a 到 z
- [A-Z] 意味着任何字符 A 到 Z
- [0-3] 意味着 0 或 1 或 2 或 3
- [0-9] 意味着任何数字 0 到 9
- [A-Za-z0-9] 任何字符,包括 a 到 z、A 到 Z、0 到 9
- \: 用于转义特殊字符
- \d 意味着:匹配字符串包含数字的地方(数字0-9)
- \D 意味着:匹配字符串不包含数字的地方
- . : 除换行符(\n)外的任何字符
- ^: 以...开始
- r'^substring' 例如 r'^love',以单词love开始的句子
- r'[^abc] 意味着不是a,不是b,不是c。
- $: 以...结束
- r'substring$' 例如 r'love$',以单词love结束的句子
- *: 零次或多次
- r'[a]*' 意味着a是可选的或可以出现多次。
- +: 一次或多次
- r'[a]+' 意味着至少一次或多次
- ?: 零次或一次
- r'[a]?' 意味着零次或一次
- \b: 单词边界,匹配单词的开始或结束
- {3}: 恰好3个字符
- {3,}: 至少3个字符
- {3,8}: 3到8个字符
- |: 或者
- r'apple|banana' 意味着苹果或香蕉
- (): 捕获和分组
让我们用例子来阐明上述元字符
方括号
让我们使用方括号来包含小写和大写
js
const pattern = '[Aa]pple' // 这个方括号意味着A或a
const txt = 'Apple and banana are fruits. An old cliche says an apple a day keeps the doctor way has been replaced by a banana a day keeps the doctor far far away. '
const matches = txt.match(pattern)
console.log(matches)
sh
["Apple", index: 0, input: "Apple and banana are fruits. An old cliche says an apple a day keeps the doctor way has been replaced by a banana a day keeps the doctor far far away.", groups: undefined]
js
const pattern = /[Aa]pple/g // 这个方括号意味着A或a
const txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
const matches = txt.match(pattern)
console.log(matches)
sh
["Apple", "apple"]
如果我们想查找banana,我们写模式如下:
js
const pattern = /[Aa]pple|[Bb]anana/g // 这个方括号意味着A或a
const txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. Banana is easy to eat too.'
const matches = txt.match(pattern)
console.log(matches)
sh
["Apple", "banana", "apple", "banana", "Banana"]
使用方括号和或运算符,我们成功提取了Apple、apple、Banana和banana。
RegExp中的转义字符(\)
js
const pattern = /\d/g // d是一个特殊字符,意味着数字
const txt = 'This regular expression example was made in January 12, 2020.'
const matches = txt.match(pattern)
console.log(matches) // ["1", "2", "2", "0", "2", "0"],这不是我们想要的
js
const pattern = /\d+/g // d是一个特殊字符,意味着数字
const txt = 'This regular expression example was made in January 12, 2020.'
const matches = txt.match(pattern)
console.log(matches) // ["12", "2020"],这是我们想要的
一次或多次(+)
js
const pattern = /\d+/g // d是一个特殊字符,意味着数字
const txt = 'This regular expression example was made in January 12, 2020.'
const matches = txt.match(pattern)
console.log(matches) // ["12", "2020"]
句点(.)
js
const pattern = /[a]./g // 这个方括号意味着a,.意味着除换行符外的任何字符
const txt = 'Apple and banana are fruits'
const matches = txt.match(pattern)
console.log(matches) // ["an", "an", "an", "a ", "ar"]
js
const pattern = /[a].+/g // . 任何字符,+ 任何字符一次或多次
const txt = 'Apple and banana are fruits'
const matches = txt.match(pattern)
console.log(matches) // ['and banana are fruits']
零次或多次(*)
零次或多次。模式可能不出现或可能出现多次。
js
const pattern = /[a].*/g //. 任何字符,* 任何字符零次或多次
const txt = 'Apple and banana are fruits'
const matches = txt.match(pattern)
console.log(matches) // ['and banana are fruits']
零次或一次(?)
零次或一次。模式可能不出现或可能出现一次。
js
const txt = 'I am not sure if there is a convention how to write the word e-mail.\
Some people write it email others may write it as Email or E-mail.'
const pattern = /[Ee]-?mail/g // ? 意味着可选
matches = txt.match(pattern)
console.log(matches) // ["e-mail", "email", "Email", "E-mail"]
RegExp中的量词
我们可以使用花括号指定我们在文本中查找的子字符串的长度。让我们看看如何使用RegExp量词。假设我们对长度为4个字符的子字符串感兴趣
js
const txt = 'This regular expression example was made in December 6, 2019.'
const pattern = /\b\w{4}\b/g // 恰好四个字符的单词
const matches = txt.match(pattern)
console.log(matches) //['This', 'made', '2019']
js
const txt = 'This regular expression example was made in December 6, 2019.'
const pattern = /\b[a-zA-Z]{4}\b/g // 恰好四个字符的单词,不包含数字
const matches = txt.match(pattern)
console.log(matches) //['This', 'made']
js
const txt = 'This regular expression example was made in December 6, 2019.'
const pattern = /\d{4}/g // 一个数字且恰好四位数
const matches = txt.match(pattern)
console.log(matches) // ['2019']
js
const txt = 'This regular expression example was made in December 6, 2019.'
const pattern = /\d{1,4}/g // 1到4位
const matches = txt.match(pattern)
console.log(matches) // ['6', '2019']
插入符号 ^
- 以...开始
js
const txt = 'This regular expression example was made in December 6, 2019.'
const pattern = /^This/ // ^ 意味着以...开始
const matches = txt.match(pattern)
console.log(matches) // ['This']
- 否定
js
const txt = 'This regular expression example was made in December 6, 2019.'
const pattern = /[^A-Za-z,. ]+/g // ^ 在字符集中意味着否定,不是A到Z,不是a到z,没有空格,没有逗号,没有句点
const matches = txt.match(pattern)
console.log(matches) // ["6", "2019"]
精确匹配
它应该有^开始和$结束。
js
let pattern = /^[A-Z][a-z]{3,12}$/;
let name = 'Asabeneh';
let result = pattern.test(name)
console.log(result) // true
🌕 你走得很远。继续前进!现在,你拥有了正则表达式的强大力量。你有能力提取和清理任何类型的文本,你可以从非结构化数据中获得意义。你刚刚完成了第12天的挑战,你在通往伟大的道路上前进了12步。现在为你的大脑和肌肉做一些练习。
💻 练习
练习:第1级
从以下文本中计算该人的年总收入。'He earns 4000 euro from salary per month, 10000 euro annual bonus, 5500 euro online courses per month.'
一些粒子在水平x轴上的位置:负方向的-12、-4、-3和-1,原点的0,正方向的4和8。提取这些数字并找到两个最远粒子之间的距离。
js
points = ['-1', '2', '-4', '-3', '-1', '0', '4', '8']
sortedPoints = [-4, -3, -1, -1, 0, 2, 4, 8]
distance = 12
编写一个模式来识别字符串是否是有效的JavaScript变量
shis_valid_variable('first_name') # True is_valid_variable('first-name') # False is_valid_variable('1first_name') # False is_valid_variable('firstname') # True
练习:第2级
编写一个名为tenMostFrequentWords的函数,从字符串中获取十个最频繁的单词?
jsparagraph = `I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.` console.log(tenMostFrequentWords(paragraph))
sh[ {word:'love', count:6}, {word:'you', count:5}, {word:'can', count:3}, {word:'what', count:2}, {word:'teaching', count:2}, {word:'not', count:2}, {word:'else', count:2}, {word:'do', count:2}, {word:'I', count:2}, {word:'which', count:1}, {word:'to', count:1}, {word:'the', count:1}, {word:'something', count:1}, {word:'if', count:1}, {word:'give', count:1}, {word:'develop',count:1}, {word:'capabilities',count:1}, {word:'application', count:1}, {word:'an',count:1}, {word:'all',count:1}, {word:'Python',count:1}, {word:'If',count:1}]
jsconsole.log(tenMostFrequentWords(paragraph, 10))
sh[{word:'love', count:6}, {word:'you', count:5}, {word:'can', count:3}, {word:'what', count:2}, {word:'teaching', count:2}, {word:'not', count:2}, {word:'else', count:2}, {word:'do', count:2}, {word:'I', count:2}, {word:'which', count:1} ]
练习:第3级
- 编写一个清理文本的函数。清理以下文本。清理后,计算字符串中三个最频繁的单词。
js
sentence = `%I $am@% a %tea@cher%, &and& I lo%#ve %tea@ching%;. There $is nothing; &as& mo@re rewarding as educa@ting &and& @emp%o@wering peo@ple. ;I found tea@ching m%o@re interesting tha@n any other %jo@bs. %Do@es thi%s mo@tivate yo@u to be a tea@cher!?`
console.log(cleanText(sentence))
sh
I am a teacher and I love teaching There is nothing as more rewarding as educating and empowering people I found teaching more interesting than any other jobs Does this motivate you to be a teacher
```
2. 编写一个查找最频繁单词的函数。清理后,计算字符串中三个最频繁的单词。
```js
console.log(mostFrequentWords(cleanedText))
[{word:'I', count:3}, {word:'teaching', count:2}, {word:'teacher', count:2}]
🎉 恭喜!🎉