Ticket #1231 (closed unconfirmed defect: fixed)

Opened 6 months ago

Last modified 5 months ago

Tag searches with extended characters give no results

Reported by: bthj Owned by:
Priority: critical Milestone: Elgg 1.7
Component: Core Version: 1.6
Severity: critical Keywords: case_sensitive, get_metastring_id,search
Cc: brettp, dave

Description

On a site running on v1.5 clicking / searching on tags with non-English characters worked fine. After upgrading to v1.6 only tags with letters form the English alphabet work as expected.

This can be reproduced on the demo site:

In this profile:  http://demo.elgg.com/pg/profile/bthj are two tags with international characters and clicking on them gives no search results. Clicking on the third tag, which has no extended characters, works fine.

Change History

Changed 6 months ago by alapzaj

  • keywords case_sensitive, get_metastring_id,search added

I can report the same problem, after upgrading elgg 1.5 -> 1.6.1. On my side, the problem is complicated. 1, the search tag gets converted to lower case in /search/index.php by elggstrolower, but the get_metastring_id($string, $case_sensitive = true ) using case_sens by default. 2, NON-English letters... the problem is cause by that $case_sensitive param.If i set it to =false , the search is working fine with non-eng chars.

My tests are: I have 3 object, each has different tags. 1, Object-tag:"lorem" - Test tag:"lorem" -> working fine 2, Object-tag:"Lorem" - Test tag:"Lorem" -> no result by default, but if i remove elggstrtolower from the list_entities_from_metadata call in /search/index.php it will come back with 1 result 3, Object-tag:"lorém" - Test tag:"lorém" -> nothing, and i need to remove the elggstrotolower and set the "case_sesitive" variable to "false" to get my object if u need it, i can provide test-site or php/mysql logs.

Changed 6 months ago by brettp

  • cc marcus removed

The problem results in casting to BINARY in get_metastring_id(). Exploring better options.

Changed 6 months ago by alapzaj

Brettp,

i think converting the given string to lowercase, and doing case sens MYSQL query is also a big problem, because all metadata with Capital letters in it, will get excluded from the results. $Binary is only a problem, if the string contains non-english letters, but skipping binary option will not solve the Capital-noncapital problem.

Changed 5 months ago by brettp

  • status changed from new to closed
  • resolution set to fixed

Fixed in [3514].

Changed 5 months ago by alapzaj

Just one more comment on this bug. Sites, with 1.6.1 and case sens enabled will create a new metastring if it contains UTF8 char, and ignore the already exist ones.
As you can see below, i have a string called "konyhaművészet", with id 280.After upgrading to 1.6.1, an user posted a new blog entry, tagged with "konyhaművészet".While creating metadata, add_metastring function will be called, which should check if the metastring already exist.The get_metastring will return false, because of forcing case sens (in the add_metastring function too), so ellg will create a new record.

The problem is, that if someone do a search for "konyhaművészet", even after your fix, the search function will only return with the old entities tagged with the given string(because we have a lot's of metastring records, with "konyhaművészet", but the query will pick up only the first one, due the " limit 1")

So, elgg admins with this problem, need to clear the database after 1.7 :(

Example querys:

mysql> select * from scsbetametastrings where string='konyhaművészet';
+------+------------------+
| id   | string           |
+------+------------------+
|  280 | Konyhaművészet | 
| 3260 | Konyhaművészet | 
+------+------------------+
2 rows in set (0.00 sec)

mysql> select * from scsbetametadata where value_id='280';
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
| id    | entity_guid | name_id | value_id | value_type | owner_guid | access_id | time_created | enabled |
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
|   958 |          43 |      75 |      280 | text       |         20 |         2 |   1248619127 | yes     | 
|  1224 |          74 |     349 |      280 | text       |         20 |         2 |   1248625306 | yes     | 
|  3179 |         373 |     349 |      280 | text       |        214 |         1 |   1249038265 | yes     | 
|  3862 |         598 |     349 |      280 | text       |        582 |         2 |   1249233249 | yes     | 
|  4738 |         839 |     349 |      280 | text       |        377 |         2 |   1249389622 | yes     | 
|  5725 |        1023 |     349 |      280 | text       |        582 |         2 |   1249410640 | yes     | 
|  6451 |        1191 |     349 |      280 | text       |       1113 |         2 |   1249542232 | yes     | 
|  7879 |        1472 |     349 |      280 | text       |        988 |        -2 |   1249827739 | yes     | 
| 26838 |        1752 |     349 |      280 | text       |         47 |         0 |   1255474016 | yes     | 
| 26839 |        1751 |     349 |      280 | text       |          2 |         2 |   1255474016 | yes     | 
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
10 rows in set (0.00 sec)

mysql> select * from scsbetametadata where value_id='3260';
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
| id    | entity_guid | name_id | value_id | value_type | owner_guid | access_id | time_created | enabled |
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
| 26822 |        1750 |     349 |     3260 | text       |         47 |         0 |   1255473809 | yes     | 
| 26823 |        1749 |     349 |     3260 | text       |          2 |         2 |   1255473809 | yes     | 
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
2 rows in set (0.00 sec)
Note: See TracTickets for help on using tickets.