origin

Many of the previous projects used solr/elasticsearch As a full-text search engine , They are comprehensive and powerful , But for smaller projects , Construction and maintenance costs are obviously too high , Especially from a relational database / Data synchronization from document database to full-text retrieval engine is very tedious , And error prone .

I remember a long time ago postgresql Database built-in full text retrieval , Recently, it is found that this database is becoming more and more popular , So I studied it again , The joyful discovery actually supports ef
core, So we did some research on it , And sorting out the experience as follows .

premise

This paper assumes that readers are familiar with entity framework core Basic concepts and basic use of .

objective

establish dotnet core project , use postgres Database and ef core, Implement common full-text retrieval functions , include

* Index fields
* Basic query
* Ranking of query results
* Query results highlight
step 1 - New project and import packages
<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType>
<TargetFramework>netcoreapp3.1</TargetFramework> </PropertyGroup> <ItemGroup> <
PackageReferenceInclude="EFCore.NamingConventions" Version="1.1.0" /> <
PackageReferenceInclude="Microsoft.Extensions.Logging.Console" Version="3.1.4"
/> <PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version
="3.1.3" /> <PackageReference Include="Microsoft.EntityFrameworkCore.Design"
Version="3.1.3" /> </ItemGroup> </Project>
be careful NamingConventions Package is optional , Its function is to translate table and field names into serpentine shapes , as MyData ->
my_data, It's easier to write sql, Don't use annoying Quotes .

step 2 - establish model and dbcontext
using System.ComponentModel.DataAnnotations; using
System.ComponentModel.DataAnnotations.Schema;using NpgsqlTypes; public class
Article {public int Id { get; set; } [Required] [MaxLength(128)] public string
Title {get; set; } [MaxLength(512)] public string Abst { get; set; } public
NpgsqlTsVector TitleVector {get; set; } public NpgsqlTsVector AbstVector { get;
set; } [NotMapped] public string TitleHL { get; set; } [NotMapped] public string
AbstHL {get; set; } }

book model In TitleVector and AbstVector Separately for storage Title and Abst Segmentation result of field , Convenient for subsequent query . You don't have to worry about the code accidentally changing these fields so that the query goes wrong , Because a trigger will be set later , The contents of these fields are automatically updated every time the data is changed .
using Microsoft.EntityFrameworkCore; public class MyDbContext : DbContext {
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) =>
optionsBuilder .UseNpgsql("
Host=localhost;Database=ft;Username=postgres;Password=123456")
.UseLoggerFactory(PgFtSearch.Program.MyLoggerFactory)
.UseSnakeCaseNamingConvention();protected override void
OnModelCreating(ModelBuilder modelBuilder) {base.OnModelCreating(modelBuilder);
modelBuilder.Entity<Article>().HasIndex(p => p.TitleVector).HasMethod("GIN");
modelBuilder.Entity<Article>().HasIndex(p => p.AbstVector).HasMethod("GIN"); }
public DbSet<Article> Articles { get; set; } }

first UseNpgsql Set which database to connect to , then UseLoggerFactory Used to print logs , Mainly sql sentence .MyLoggerFactory How did it come , Refer to subsequent codes .

GIN Two lines of , Used to tell the database that the two fields are inverted .

step 3 - generate migration And add trigger manually

dotnet ef migrations add Init

then , Generating migration Add trigger manually in file , When adding or modifying data , Automatically modify the contents of index fields , Applications don't have to worry about index synchronization .
migrationBuilder.Sql( @"CREATE TRIGGER article_title_search_vector_update
BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(title_vector, 'pg_catalog.english', title);");
migrationBuilder.Sql(@"CREATE TRIGGER article_abst_search_vector_update BEFORE
INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(abst_vector, 'pg_catalog.english', abst);");
step 4 - Programming
using System; using System.Collections.Generic; using System.Linq; using
Microsoft.EntityFrameworkCore;using Microsoft.Extensions.Logging; namespace
PgFtSearch {class Program { public static readonly ILoggerFactory
MyLoggerFactory= LoggerFactory.Create(builder => { builder.AddConsole(); });
static void Main(string[] args) { using (var db = new MyDbContext()) { if (!
db.Articles.Any()) {var articles = new List<Article>{ new Article{Title="
testing is ok", Abst="this is a test about postgre full text searching"}, new
Article{Title="tested all bugs", Abst="there is no bug exists in this app"} };
db.AddRange(articles); db.SaveChanges(); }var query = "test"; var data =
db.Articles .Where(p=> p.TitleVector.Matches(query) ||
p.AbstVector.Matches(query)) .OrderByDescending(p
=>p.TitleVector.Rank(EF.Functions.ToTsQuery(query)) *2.0 +
p.AbstVector.Rank(EF.Functions.ToTsQuery(query))) .Select(p=>new Article{ Title
= p.Title, Abst = p.Abst, TitleHL =
EF.Functions.ToTsQuery(query).GetResultHeadline(p.Title), AbstHL=
EF.Functions.ToTsQuery(query).GetResultHeadline(p.Abst), });foreach (var article
in data) { Console.WriteLine($"
{article.Title}\t{article.Abst}\t{article.TitleHL}\t{article.AbstHL}"); } } } }
}
first , If there is no data , Insert several test data .

Here's the key , Code data query , The specific functions are :

* use test Keywords in title or abst Query data in field
* Sort query results ,title Field sort weight =2.0, higher than abst Field weight =1.0
* Of search results title and abst Highlight
Finally generated SQL as follows :
SELECT
  a.title AS "Title",
  a.abst AS "Abst",
  ts_headline(a.title, to_tsquery(@__query_0)) AS "TitleHL",
  ts_headline(a.abst, to_tsquery(@__query_0)) AS "AbstHL" FROM articles AS a
WHERE (a.title_vector @@ plainto_tsquery(@__query_0)) OR (a.abst_vector @@
plainto_tsquery(@__query_0)) ORDER BY (ts_rank(a.title_vector, to_tsquery(
@__query_0))::double precision * 2.0) + ts_rank(a.abst_vector, to_tsquery(
@__query_0))::double precision DESC
Here's the code , I believe everyone can understand , Welcome to exchange if you have any questions .

summary

At present, the support of Chinese word segmentation has not been studied , No test performance . But on the whole , It can be used in small and medium-sized projects postgres Replacement of built-in full-text retrieval function of database solr/es Etc , Reduce system complexity , Improve the stability of full-text retrieval function .

Technology
©2019-2020 Toolsou All rights reserved,
Python+OpenCV Detailed explanation of face recognition technology SpringBoot practice ( five ):mybatis-plus In BaseMapper,Iservice and ServiceImpl Google says home office affects work efficiency !2021 Return to offline office in 2010 C Language programming to find a student's grade vue Of v-if And v-show The difference between C Language console games , Make bricks These songs , Programmers, don't listen ! What should I do if I suddenly encounter a question I can't answer during the interview ?python To solve the problem of dictionary writing list in c Linguistic 5 Three common sorting methods