<!-- 
RSS generated by JIRA (1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d) at Thu Feb 08 23:11:55 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary add field=key&field=summary to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>FOLIO Jira</title>
    <link>https://folio-org.atlassian.net</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>1001.0.0-SNAPSHOT</version>
        <build-number>100246</build-number>
        <build-date>07-02-2024</build-date>
    </build-info>

<item>
            <title>[FOLIO-1246] Implement Postgres Full Text Search functionality</title>
                <link>https://folio-org.atlassian.net/browse/FOLIO-1246</link>
                <project id="10290" key="FOLIO">FOLIO</project>
                    <description>&lt;p&gt;In Durham we have discussed that the current approach to searching (which, essentially, is based on translating CQL queries to complex PSQL queries with LIKE/ILIKE/regex and creating BTREE and GIN indices on particular down-cased/unaccented columns, with support from RMB and CQL2PG) may not be flexible enough to provide the level of search functionality and performance expected. (Note: we do not have concrete requirements here, we are assuming that what is expected matches functionality in modern OPAC systems).&lt;/p&gt;

&lt;p&gt;One of the approaches discussed was to use the Postgres built-in &quot;Full Text Search&quot; capability, documented here:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/static/textsearch-intro.html&quot; class=&quot;external-link&quot; rel=&quot;nofollow noreferrer&quot;&gt;https://www.postgresql.org/docs/current/static/textsearch-intro.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In short, the essential pieces of full text search in PG are two functions/data structures:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;tsvector&lt;/li&gt;
	&lt;li&gt;tsquery&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;b&gt;tsvector&lt;/b&gt; takes a &#8220;document&#8221; which can be a single column, multiple columns from a single table or multiple tables, a view, or the entire JSONB document and returns a list (vector) of lexemes, which are normalised tokens (case/accent normalised, with stopwords filtered and roots extrated).&lt;/p&gt;

&lt;p&gt;&lt;b&gt;tsquery&lt;/b&gt; does the same but with the user input, e.g a list of words to match. &lt;b&gt;tsquery&lt;/b&gt; supports the same booleans as CQL so the translation is rather trivial &#8212; &amp;amp; for AND, || for OR, ! for NOT,  &amp;lt;-&amp;gt; for PROX&lt;/p&gt;

&lt;p&gt;One can run both tsvector and tsquery at the query time, but this obviously does not scale so normally tsvector is used at the indexing time to build a GIN index on the selected data.&lt;br/&gt;
tsvector/tsquery are the most fundamental data structures, PG includes more advanced functionalities expected in a search engine: ranking, highlighting/snippets, etc. It does not include specialized facet data structure &amp;#8211; this needs to be implemented through other methods.&lt;/p&gt;

&lt;p&gt;Since PG 10, tsvector can take the whole JSONB contents in one go:&lt;br/&gt;
&lt;a href=&quot;https://wiki.postgresql.org/wiki/New_in_postgres_10#Full_Text_Search_support_for_JSON_and_JSONB&quot; class=&quot;external-link&quot; rel=&quot;nofollow noreferrer&quot;&gt;https://wiki.postgresql.org/wiki/New_in_postgres_10#Full_Text_Search_support_for_JSON_and_JSONB&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This, however, drops the keys and only indexes the &#8220;contents&#8221; of the JSONB, so we would not be able to &#8220;scope&#8221; the searches to a given field. One proposed approach here is to use triggers to do advanced indexing:&lt;br/&gt;
&lt;a href=&quot;https://stackoverflow.com/questions/45680936/how-to-implement-full-text-search-on-complex-nested-jsonb-in-postgresql&quot; class=&quot;external-link&quot; rel=&quot;nofollow noreferrer&quot;&gt;https://stackoverflow.com/questions/45680936/how-to-implement-full-text-search-on-complex-nested-jsonb-in-postgresql&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="80728">FOLIO-1246</key>
            <summary>Implement Postgres Full Text Search functionality</summary>
                <type id="10006" iconUrl="https://folio-org.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10307?size=medium">Umbrella</type>
                                            <priority id="10002" iconUrl="https://dev.folio.org/assets/jira-priority/jira-p3.svg">P3</priority>
                        <status id="6" iconUrl="https://folio-org.atlassian.net/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10003">Done</resolution>
                                                        <assignee accountid="557058:b8e64633-1f7c-402d-9caf-9959a5ba5d0d">Jakub Skoczen</assignee>
                                                                <reporter accountid="557058:b8e64633-1f7c-402d-9caf-9959a5ba5d0d">Jakub Skoczen</reporter>
                                    <labels>
                            <label>inventory</label>
                            <label>performance</label>
                            <label>search_enhancements</label>
                            <label>sprint40</label>
                            <label>sprint41</label>
                            <label>sprint42</label>
                            <label>sprint43</label>
                            <label>sprint44</label>
                            <label>sprint45</label>
                    </labels>
                <created>Tue, 15 May 2018 11:28:43 +0000</created>
                <updated>Thu, 27 Jun 2019 08:59:17 +0000</updated>
                            <resolved>Thu, 27 Jun 2019 08:59:17 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                <comments>
                                                            <comment id="193577" author="557058:b8e64633-1f7c-402d-9caf-9959a5ba5d0d" created="Tue, 15 May 2018 11:30:24 +0000"  >&lt;p&gt;Guys, one thing that may or may not be a deal breaker for this approach are the facets., here is one approach:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://akorotkov.github.io/blog/2016/06/17/faceted-search/&quot; class=&quot;external-link&quot; rel=&quot;nofollow noreferrer&quot;&gt;http://akorotkov.github.io/blog/2016/06/17/faceted-search/&lt;/a&gt;&lt;/p&gt;</comment>
                                                            <comment id="193581" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Tue, 15 May 2018 11:38:07 +0000"  >&lt;p&gt;that is how facets are implemented currently in rmb&lt;/p&gt;</comment>
                                                            <comment id="193587" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Thu, 17 May 2018 09:14:50 +0000"  >&lt;p&gt;i think this boils down to requirements - &lt;br/&gt;
it seems like the postgres full text has put a check on most of the search functionality - i think we need to understand what functionality we currently are not supporting that is needed.&lt;/p&gt;

&lt;p&gt;is there someone who can list the OPAC search features needed in v1 so we can analyze how they work in postgres full text?&lt;/p&gt;

&lt;p&gt;1. would our query require stop words indexed in the title for exact title matches but of course also allow matching on titles without stopwords but potentially with a lower rank (hence ORing two query phrases). &lt;br/&gt;
ex. a query for: to be or not to be, &lt;/p&gt;

&lt;p&gt;2. matching words but also synonyms, but ranking down synonym matches in the result set as they may introduce alot of noise - same for thesaurus and spelling corrections (these really pollute a result set many times) &amp;lt;- is this even a requirement?&lt;/p&gt;

&lt;p&gt;3. stemming, once again , we may want to rank the exact word match higher then a stemmed form of the word - especially in title matching. seems like postgres uses the Snowball stemmer - which is a very hard stemmer - will stem words into non words occasionally - &lt;br/&gt;
revival		-&amp;gt;		reviv&lt;br/&gt;
happy		-&amp;gt;		happi&lt;br/&gt;
etc...&lt;br/&gt;
in the past i have migrated away from this stemmer into much better ones available in solr (this may be an option in postgres as well)&lt;/p&gt;

&lt;p&gt;4. as mentioned - matching words across multiple fields in the json but still maintaining structure so that for example, title matches are ranked before description or something of the sort.&lt;/p&gt;

&lt;p&gt;5. facets - will most probably not perform as well as in solr&lt;/p&gt;

&lt;p&gt;6. do we just want the ranking?&lt;/p&gt;

&lt;p&gt;i think maybe all the above are solvable / patially solvable&lt;/p&gt;

&lt;p&gt;i guess, the important thing is to get some sort of requirements of what we need for v1. &lt;/p&gt;</comment>
                                                            <comment id="193595" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Tue, 22 May 2018 11:36:25 +0000"  >&lt;p&gt;looked a bit more into this , and some additional thoughts:&lt;/p&gt;

&lt;p&gt;1. i would recommend to index the content as is with the full text feature (may need per field)&lt;br/&gt;
&lt;tt&gt;CREATE INDEX idx_fts ON harvard_mod_inventory_storage.instance  USING gin ( to_tsvector(&apos;non_existant_conf&apos;,jsonb) );&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;meaning, create a word vector with positions without using any lexical functionality (do not remove stop words, do not stem, do not add synonyms, etc...)&lt;/p&gt;

&lt;p&gt;during query time i would take the user query and do two things.&lt;br/&gt;
   a. analyze it using the lexical tools (remove stop words, stemming, ,,,)&lt;br/&gt;
   b. take the query as is &lt;br/&gt;
   c. run an OR between the two and boost up (b)&lt;/p&gt;

&lt;p&gt;motivation:&lt;br/&gt;
stop word removal, stemming, etc... adds a lot of noise and if our index is indexed without that information it will be very difficult to return exact matching since &lt;tt&gt;friendly&lt;/tt&gt; will be stemmed to &lt;tt&gt;friend&lt;/tt&gt;, stop words removed, etc... we will not be able to boost up the actual word that was queried in many cases. i think the above is a good middle ground. we can then add additional query clauses with lower and lower boosts&lt;br/&gt;
1. exact words entered&lt;br/&gt;
2. without stopwords&lt;br/&gt;
3. stemmed&lt;br/&gt;
4. synonyms / thesaurus (maybe)&lt;/p&gt;

&lt;p&gt;highlighting of the matching terms is also supported so that this can be displayed in the UI&lt;/p&gt;

&lt;p&gt;note that there is a perf penalty for ranking, i dont have numbers on that &lt;/p&gt;</comment>
                                                            <comment id="193600" author="557058:b8e64633-1f7c-402d-9caf-9959a5ba5d0d" created="Wed, 30 May 2018 10:17:10 +0000"  >&lt;p&gt;&lt;a href=&quot;https://folio-org.atlassian.net/secure/ViewProfile.jspa?accountId=712020%3A32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; class=&quot;user-hover&quot; rel=&quot;712020:32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; data-account-id=&quot;712020:32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; accountid=&quot;712020:32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; rel=&quot;noreferrer&quot;&gt;shale99&lt;/a&gt; I am not exactly sure what you propose above. If you build the tsvector GIN index without stemming or synonyms you will not be able apply stemming/synonyms at query time (tsquery), since what you search for would not be in the index. Stemming, actually, lowers the size of index not increase it &amp;#8211; with stemming you only index the stemmed lexeme not all possible variations. Synonyms is obviously the inverse. It is true that there doesn&apos;t seem to be a way to &quot;boost&quot; non-stemmed, exact matches with PG, but it may or may not be a real problem (see below).&lt;/p&gt;

&lt;p&gt;As far as stop-words are concerned &amp;#8211; enabling them should help precision and performance because they rarely add anything but noise. When it comes to exact matches (which as you note may be problematic with stop-words enabled) PG has the &lt;b&gt;phraseto_tsquery&lt;/b&gt; functions which generates and expression with proximity operators in place for stop-words. We would need to test the stop-word functionality with vile cases like &quot;to be or not to be&quot;.&lt;/p&gt;</comment>
                                                            <comment id="193607" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Wed, 30 May 2018 10:29:38 +0000"  >&lt;p&gt;i guess what i am saying is, once you index without stop words, and stem, the originals are lost forever, a case like &quot;to be or not to be&quot; will never be found. i would also argue that phrase matching with proximity would not be enough to over come this. &lt;/p&gt;

&lt;p&gt;for example, two titles: &quot;about a boy&quot; and &quot;about the boys on the team&quot; would both match the query &quot;about a boy&quot; as the stop words are dropped and words are stemmed and there is no way to solve these issues since the indexes dont have the unstemmed + stopwords. &lt;/p&gt;

&lt;p&gt;my suggestion was to index as is , and move the stemming + stopwords to query time - so that if someone queries &quot;about a boy&quot;&lt;br/&gt;
you create a query &quot;about a boy&quot; with a high boost OR &quot;about boy&quot; &amp;lt;- this would return the correct result&lt;br/&gt;
you can also query &quot;about the boys&quot; with a high boost OR &quot;about boy&quot; lower boost &amp;lt;- this would return the correct results first&lt;/p&gt;

&lt;p&gt;the index does become smaller with stop word removal and stemming , and gets larger with synonyms - i agree and i guess that wasnt clear from my comments.&lt;/p&gt;</comment>
                                                            <comment id="193614" author="557058:b8e64633-1f7c-402d-9caf-9959a5ba5d0d" created="Wed, 30 May 2018 10:47:17 +0000"  >&lt;blockquote&gt;&lt;p&gt;i guess what i am saying is, once you index without stop words, and stem, the originals are lost forever, a case like &quot;to be or not to be&quot; will never be found. i would also argue that phrase matching with proximity would not be enough to over come this.&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Assuming all the words in &quot;to be or not to be&quot; are considered stop-words. Something that should be checked.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;for example, two titles: &quot;about a boy&quot; and &quot;about the boys on the team&quot; would both match the query &quot;about a boy&quot; as the stop words are dropped and words are stemmed and there is no way to solve these issues since the indexes dont have the unstemmed + stopwords.&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Yes, it would match both. But that is as it should be and here the &lt;b&gt;tsrank&lt;/b&gt; kicks in &amp;#8211; the shorter and the better match would have higher word frequency and thus higher rank. So &quot;about a boy&quot; will still be the top hit.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;my suggestion was to index as is , and move the stemming + stopwords to query time - so that if someone queries &quot;about a boy&quot;&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;you create a query &quot;about a boy&quot; with a high boost OR &quot;about boy&quot; &amp;lt;- this would return the correct result&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;you can also query &quot;about the boys&quot; with a high boost OR &quot;about boy&quot; lower boost &amp;lt;- this would return the correct results first&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Yes, you could do that, but it would defeat the original purpose: to remove stop-words as the source of noise for ranking and to stem as a way to allow user to find documents where query keywords have different variation or spelling. To do that you would need to stem and stop-word at the index time but index the original tokens with higher (non-stemmed) and lower (stop-word) boost factors. I am not sure if PG supports that.&lt;/p&gt;</comment>
                                                            <comment id="193621" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Wed, 30 May 2018 10:56:47 +0000"  >&lt;p&gt;i dont know how the tsrank works, but i am not sure it ends with this - the words may appear in additional fields so it usually doesnt just end with a field vs field match rank - for example, if one document has the title being queried in a few fields (for example, the doc is discussing the title in question) and another has the incorrect title in the title field, and there is a boost for the title field, will that be enough to push the wrong title up although it is clearly not what the user is looking for? I dont know,, but ranking is quite complex , but i have not researched the tsrank enough to comment.&lt;/p&gt;

&lt;p&gt;I dont know how EDS looks now, but a few years back , when i was doing a lot of analysis on their search quality, it was evident that they were doing something like the stop words + (maybe) stemming removal from the index (i think i dont know for sure) - and their known item search was just flat out bad because they were missing this info in the index - maybe there is an ebsco rep who can comment on that&lt;/p&gt;</comment>
                                                            <comment id="193626" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Wed, 30 May 2018 11:13:52 +0000"  >&lt;p&gt;we can start with the removal and stem and see where that goes, if its good enough then its the simplest solution &lt;/p&gt;</comment>
                                                            <comment id="193631" author="557058:b8e64633-1f7c-402d-9caf-9959a5ba5d0d" created="Wed, 30 May 2018 11:36:02 +0000"  >&lt;p&gt;&lt;b&gt;ts_rank&lt;/b&gt;, at least the basic one, looks at the word frequency in the tsvector. You would need separate tsvector for each field to differentiate ranking/ranking/searching across fields (note: there is one way to &quot;workaround&quot; it through &lt;b&gt;setweight&lt;/b&gt; but I don&apos;t think this is flexible enough for a schema-less system). But assuming you have a separate title index your &quot;about a boy&quot; example would work as expected, with stop-words and stemming enabled.&lt;/p&gt;

&lt;p&gt;Now, this is where we are getting to the &quot;core of the problem&quot;. Postgres offers a lot of building blocks for full text search but I don&apos;t think it can be considered an out-of-the-box solution (at least as compared to ES).&lt;/p&gt;

&lt;p&gt;Going back to your stemming and stop-words &quot;problem&quot;. One way to attack it with PG is to use the &lt;b&gt;setweight&lt;/b&gt; function. Behind this cryptic name you get a function which allows you to &quot;label&quot; the lexeme (normalised token in PG nomenclature) with a tag e.g A labeled &quot;about boy&quot; is &lt;span class=&quot;error&quot;&gt;&amp;#91;about:A, boy:A&amp;#93;&lt;/span&gt;. So let&apos;s assume that the way we index our records is as follows:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;we index the un-stemmed but stop-word-extracted content as label &lt;b&gt;A&lt;/b&gt;&lt;/li&gt;
	&lt;li&gt;we index the stemmed and stop-worded content as label &lt;b&gt;B&lt;/b&gt;&lt;/li&gt;
	&lt;li&gt;we index the stop-words with label &lt;b&gt;C&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;This would require a fairly complex set up of Postgres dictionaries (see &lt;a href=&quot;https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html&quot; class=&quot;external-link&quot; rel=&quot;nofollow noreferrer&quot;&gt;https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html&lt;/a&gt;) and INDEX or TRIGGERS but let&apos;s assume it can be implemented (of course it needs to be verified)&lt;/p&gt;

&lt;p&gt;At query time, we do exactly the same: we run the queries through those three dictionaries, give appropriate label to each subquery and then or (||) the result into a single expression.&lt;/p&gt;

&lt;p&gt;Now, here comes &lt;b&gt;ts_rank&lt;/b&gt; again &amp;#8211; it accepts an array of weights for each label: e.g &lt;/p&gt;
{C-weight, B-weight, A-weight}
&lt;p&gt;. This would allow you to boost matches where exact match occurs while still be able to match on stemmed words and stop-words. But I agree the SQL code aroud it will be quite complex.&lt;/p&gt;</comment>
                                                            <comment id="193637" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Wed, 30 May 2018 18:15:43 +0000"  >&lt;blockquote&gt;&lt;p&gt;But assuming you have a separate title index your &quot;about a boy&quot; example would work as expected, with stop-words and stemming enabled.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;i might be missing something, if i have multiple tsvectors, meaning i have matched the query to multiple fields, if i matched &quot;about the boys&quot; in the title and another record i matched &quot;about a boy&quot; in the description - which record wins?&lt;/p&gt;

&lt;p&gt;why get into the complexity though - lets just index one way - the data as is, and query it in a smart why and let the scoring push down the bad results - so people need to actually scroll to find them - maybe we can also set a limit for the ranking so that really low ranks do not get returned&lt;/p&gt;</comment>
                                                            <comment id="193641" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Wed, 30 May 2018 19:02:40 +0000"  >&lt;p&gt;also for keyword queries, for example &quot;us foreign affairs&quot; - finding titles matching &quot;foreign affairs in the us&quot; would seem to make sense. so this would mean we would want to send multiple queries in any case &lt;/p&gt;

&lt;p&gt;the queries i would send would be&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;a phrase query (if stop words were included in the query keep them as part of the phrase query)&lt;/li&gt;
	&lt;li&gt;a phrase without the stopwords&lt;/li&gt;
	&lt;li&gt;a key word query without the stopwords&lt;br/&gt;
this gets trickier if someone queries &quot;rowling harry potter&quot; where we would need to allow key word matching across multiple fields but of course&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;what i am seeing is that when indexing without stopwords the positions are still maintained , for example:&lt;/p&gt;

&lt;p&gt;arms and dollars - while the lexemes are arm , dollar . the positions seem to be arm &lt;/p&gt;
{0}
&lt;p&gt; , dollar &lt;/p&gt;
{2}
&lt;p&gt; - so if i query arms &amp;lt;-&amp;gt; dollars , i will not find it.&lt;/p&gt;
</comment>
                                                            <comment id="193647" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Wed, 30 May 2018 19:09:24 +0000"  >&lt;p&gt;on the technical side - the full text searching seems to be very fast&lt;/p&gt;

&lt;p&gt;on 4.6 million records, the index size on the title:&lt;/p&gt;

&lt;p&gt;okapi_modules=# SELECT  pg_size_pretty(&lt;b&gt;pg_relation_size(&apos;diku_mod_inventory_storage.&lt;font color=&quot;#d04437&quot;&gt;idx_fts_title&lt;/font&gt;&apos;)&lt;/b&gt;);&lt;br/&gt;
 pg_size_pretty&lt;br/&gt;
----------------&lt;br/&gt;
 337 MB&lt;br/&gt;
(1 row)&lt;/p&gt;</comment>
                                                            <comment id="193651" author="557058:b8e64633-1f7c-402d-9caf-9959a5ba5d0d" created="Wed, 30 May 2018 20:05:20 +0000"  >&lt;p&gt;&lt;a href=&quot;https://folio-org.atlassian.net/secure/ViewProfile.jspa?accountId=712020%3A32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; class=&quot;user-hover&quot; rel=&quot;712020:32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; data-account-id=&quot;712020:32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; accountid=&quot;712020:32bb56ac-50e7-4787-b4af-ed3089d9401c&quot; rel=&quot;noreferrer&quot;&gt;shale99&lt;/a&gt; My writeup about using &lt;b&gt;setweight&lt;/b&gt; was addressing the problem you mentioned about searching for phrases were stop-words are critical and boosting exact (non-stemmed) matches. I think this is minority of cases and certainly not something we should be spending time optimising now. I propose a simple approach to evaluate PGFT: use the default &quot;english&quot; dictionary to index (with default stemming and stop-words) and query and test both normal and phrase search (phrase_to_tsquery).&lt;/p&gt;</comment>
                                                            <comment id="193654" author="712020:32bb56ac-50e7-4787-b4af-ed3089d9401c" created="Thu, 31 May 2018 07:13:34 +0000"  >&lt;p&gt;the thing is with searching , especially here, is the details and these requests will come and we will need to see if the postgres engine can support them in a sensible way.&lt;/p&gt;

&lt;p&gt;what i have seen so far is that it gives a nice amount of functionality, but to do things will take some bending and may become inefficient &lt;/p&gt;

&lt;p&gt;for example:&lt;br/&gt;
1. librarians would probably want titles that &lt;b&gt;start with&lt;/b&gt; their terms boosted up, may be doable, didnt see yet how&lt;br/&gt;
2. running searches on the perf env (4.6M records) - i see cases where i enter a query and get the stemmed version ranked before the original query i entered which may be ok and may not be - input welcomed: &lt;br/&gt;
for example:&lt;br/&gt;
to_tsquery(&apos;english&apos;,&apos; constitutional &amp;amp; amendment&apos;)&lt;br/&gt;
searches for:&lt;br/&gt;
Index Cond: -&amp;gt; &apos;&apos;&apos;constitut&apos;&apos; &amp;amp; &apos;&apos;amend&apos;&apos;&apos;&lt;br/&gt;
3. i dont see facet support, if we can facet on the top N returned records, this may be ok, if we need to facet on the entire result set this would be a deal breaker&lt;br/&gt;
4. i dont see out of the box CJK support - meaning, tokenization / if we want to normalize to simplified chinese, we need to handle this ourselves&lt;/p&gt;


&lt;p&gt;my preference would be to go with postgres just because it would be easier to maintain a single data store. and it gives alot more functionality and is faster then what we are doing.&lt;br/&gt;
my hesitation is that once we get into the details , it is still not as flexible as solr or ES and that may become a problem. as for a v1 solution , it is better then our current one&lt;/p&gt;
</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10000">
                    <name>Blocks</name>
                                            <outwardlinks description="blocks">
                                        <issuelink>
            <issuekey id="10074">UXPROD-745</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10646">UXPROD-1045</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10649">UXPROD-1048</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10650">UXPROD-1049</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="74804">UISE-68</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="74807">UISE-69</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10003">
                    <name>Relates</name>
                                                                <inwardlinks description="relates to">
                                        <issuelink>
            <issuekey id="55649">MODINVSTOR-163</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="37558">CQLPG-37</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="37559">CQLPG-40</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="74834">UISE-80</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10000" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummarycf">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10057" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Development Team</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10144"><![CDATA[Core: Platform]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10019" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|hzyjv3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10020" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10024" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>[CHART] Date of First Response</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 15 May 2018 11:38:07 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10025" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>[CHART] Time in Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>