HTML 处理:XPATH 案例
当然在处理 HTML 的时候,正则很多时候是够用的,但是总有并不那么好写的提取数据的正则。
为了处理这种情况,我们可以在 yak 中使用 xpath 来解决这个问题
doc, err := xpath.LoadHTMLDocument(`<!DOCTYPE html><html lang="en-US"><head><title>Hello,World!</title></head><body><div class="container"><header> <!-- Logo --> <h1>City Gallery</h1></header><nav> <ul> <li><a href="/London">London</a></li> <li><a href="/Paris">Paris</a></li> <li><a href="/Tokyo">Tokyo</a></li> </ul></nav><article> <h1>London</h1> <img src="pic_mountain.jpg" alt="Mountain View" style="width:304px;height:228px;"> <p>London is the capital city of England. It is the most populous city in the United Kingdom, with a metropolitan area of over 13 million inhabitants.</p> <p>Standing on the River Thames, London has been a major settlement for two millennia, its history going back to its founding by the Romans, who named it Londinium.</p></article><footer>Copyright © W3Schools.com</footer></div></body></html>`)die(err)
// 寻找 p 标签的内容nodes := xpath.Find(doc, "//p")for _, node := range nodes { // 打印文本 println(xpath.InnerText(node))}
// 寻找 li 标签(第一个)nodes := xpath.Query(doc, "//li")for _, node := range nodes { // 打印文本 println(xpath.OutputHTML(node)) // <a href="/London">London</a> println(xpath.OutputHTMLSelf(node)) // <li><a href="/London">London</a></li>}