危机公关

使用Docker和Elasticsearch搭建全文本搜索引擎应用(下)

  • 时间:
  • 浏览:37434

Elasticsearch已经灌入100本书籍数据(大约230000段落)  ,百度SEO排名  ,本节做一些搜索操作 。

5.0 简单http查询

首先  ,使用localhost:9200/library/ ... retty , 这里使用全文本查询关键字“Java”  ,输入应该如下:

{ "took" : 11, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 13, "max_score" : 14.259304, "hits" : [ { "_index" : "library", "_type" : "novel", "_id" : "p_GwFWEBaZvLlaAUdQgV", "_score" : 14.259304, "_source" : { "author" : "Charles Darwin", "title" : "On the Origin of Species", "location" : 1080, "text" : "Java, plants of, 375." } }, { "_index" : "library", "_type" : "novel", "_id" : "wfKwFWEBaZvLlaAUkjfk", "_score" : 10.186235, "_source" : { "author" : "Edgar Allan Poe", "title" : "The Works of Edgar Allan Poe", "location" : 827, "text" : "After many years spent in foreign travel, I sailed in the year 18-- , from the port of Batavia, in the rich and populous island of Java, on a voyage to the Archipelago of the Sunda islands. I went as passenger--having no other inducement than a kind of nervous restlessness which haunted me as a fiend." } }, ... ] } }

Elasticsearch HTTP接口对于测试数据是否正常插入很有用  ,但是如果直接暴露给web应用就很危险  。不应该将操作性API功能(例如直接添加和删除文档)直接暴露给应用  ,而应该写一段简单Node.js API接收客户端请求  ,(通过私网)转发给Elasticsearch进行查询  。

5.1 请求脚本

这一节介绍如何从Node.js应用中向Elasticsearch中发送请求 。首先创建新文件:server/search.js  。

const { client, index, type } = require('./connection') module.exports = { /** Query ES index for the provided term */ queryTerm (term, offset = 0) { const body = { from: offset, query: { match: { text: { query: term, operator: 'and', fuzziness: 'auto' } } }, highlight: { fields: { text: {} } } } return client.search({ index, type, body }) } }

本模块定义了一个简单的search功能  ,使用输入信息进行匹配查询 。详细字段解释如下:

from:为结果标出页码  。每次查询默认返回10个结果;因此指定from为10  ,可以直接显示10-20的查询结果  。

query:具体查询关键词 。

operator:具体查询操作;本例中采用“and”操作符 ,优先显示包含所有查询关键词的结果  。

fuzziness:错误拼写修正级别(或者是模糊查询级别)  ,默认是2  。数值越高  ,允许模糊度越高;例如数值1  ,会对Patricc的查询返回Patrick结果  。

highlights:返回额外信息  ,其中包含HTML格式显示匹配文本信息  。 可以调整这些参数看看具体的显示信息  ,可以查看Elastic Full-Text Query DSL获得更多信息  。

6. API

本节提供前端代码访问的HTTP API  。

6.0 API Server

修改server/app.js内容如下:

const Koa = require('koa') const Router = require('koa-router') const joi = require('joi') const validate = require('koa-joi-validate') const search = require('./search') const app = new Koa() const router = new Router() // Log each request to the console app.use(async (ctx, next) => { const start = Date.now() await next() const ms = Date.now() - start console.log(`${ctx.method} ${ctx.url} - ${ms}`) }) // Log percolated errors to the console app.on('error', err => { console.error('Server Error', err) }) // Set permissive CORS header app.use(async (ctx, next) => { ctx.set('Access-Control-Allow-Origin', '*') return next() }) // ADD ENDPOINTS HERE const port = process.env.PORT || 3000 app .use(router.routes()) .use(router.allowedMethods()) .listen(port, err => { if (err) throw err console.log(`App Listening on Port ${port}`) })

这段代码导入服务依赖环境  ,为Koa.js Node API Server设置简单日志和错误处理机制  。

6.1 将服务端点与查询链接起来

这一节为Server端添加服务端点 ,以便暴露给Elasticsearch查询服务  。

在server/app.js中//ADD ENDPOINTS HERE 之后插入如下代码:

/** * GET /search * Search for a term in the library */ router.get('/search', async (ctx, next) => { const { term, offset } = ctx.request.query ctx.body = await search.queryTerm(term, offset) } )

用docker-compose up -d --build重启服务端  。在浏览器中  ,调用此服务  。例如:localhost:3000/search?term=java  。

返回结果看起来应该如下:

{ "took": 242, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 93, "max_score": 13.356944, "hits": [{ "_index": "library", "_type": "novel", "_id": "eHYHJmEBpQg9B4622421", "_score": 13.356944, "_source": { "author": "Charles Darwin", "title": "On the Origin of Species", "location": 1080, "text": "Java, plants of, 375." }, "highlight": { "text": ["Java, plants of, 375."] } }, { "_index": "library", "_type": "novel", "_id": "2HUHJmEBpQg9B462xdNg", "_score": 9.030668, "_source": { "author": "Unknown Author", "title": "The King James Bible", "location": 186, "text": "10:4 And the sons of Javan; Elishah, and Tarshish, Kittim, and Dodanim." }, "highlight": { "text": ["10:4 And the sons of Javan; Elishah, and Tarshish, Kittim, and Dodanim."] } } ... ] } }

6.2 输入验证

此时服务端还是很脆弱  ,下面对输入参数进行检查  ,对无效或者缺失的输入进行甄别 ,并返回错误  。