HTML 处理:XPATH 案例
当然在处理 HTML 的时候,正则很多时候是够用的,但是总有并不那么好写的提取数据的正则。
为了处理这种情况,我们可以在 yak 中使用 xpath 来解决这个问题
doc, err := xpath.LoadHTMLDocument(`
<!DOCTYPE html><html lang="en-US">
<head>
<title>Hello,World!</title>
</head>
<body>
<div class="container">
<header>
<!-- Logo -->
<h1>City Gallery</h1>
</header>
<nav>
<ul>
<li><a href="/London">London</a></li>
<li><a href="/Paris">Paris</a></li>
<li><a href="/Tokyo">Tokyo</a></li>
</ul>
</nav>
<article>
<h1>London</h1>
<img src="pic_mountain.jpg" alt="Mountain View" style="width:304px;height:228px;">
<p>London is the capital city of England. It is the most populous city in the United Kingdom, with a metropolitan area of over 13 million inhabitants.</p>
<p>Standing on the River Thames, London has been a major settlement for two millennia, its history going back to its founding by the Romans, who named it Londinium.</p>
</article>
<footer>Copyright © W3Schools.com</footer>
</div>
</body>
</html>
`)
die(err)
// 寻找 p 标签的内容
nodes := xpath.Find(doc, "//p")
for _, node := range nodes {
// 打印文本
println(xpath.InnerText(node))
}
// 寻找 li 标签(第一个)
nodes := xpath.Query(doc, "//li")
for _, node := range nodes {
// 打印文本
println(xpath.OutputHTML(node)) // <a href="/London">London</a>
println(xpath.OutputHTMLSelf(node)) // <li><a href="/London">London</a></li>
}