Ticket #1231 (closed unconfirmed defect: fixed)

Opened 12 months ago

Last modified 11 months ago

Tag searches with extended characters give no results

Reported by: bthj Owned by:
Priority: critical Milestone: Elgg 1.7
Component: Core Version: 1.6
Severity: critical Keywords: case_sensitive, get_metastring_id,search
Cc: brettp, dave

Description

On a site running on v1.5 clicking / searching on tags with non-English characters worked fine. After upgrading to v1.6 only tags with letters form the English alphabet work as expected.

This can be reproduced on the demo site:

In this profile:  http://demo.elgg.com/pg/profile/bthj are two tags with international characters and clicking on them gives no search results. Clicking on the third tag, which has no extended characters, works fine.

Change History

Changed 12 months ago by alapzaj

  • keywords case_sensitive, get_metastring_id,search added

I can report the same problem, after upgrading elgg 1.5 -> 1.6.1. On my side, the problem is complicated. 1, the search tag gets converted to lower case in /search/index.php by elggstrolower, but the get_metastring_id($string, $case_sensitive = true ) using case_sens by default. 2, NON-English letters... the problem is cause by that $case_sensitive param.If i set it to =false , the search is working fine with non-eng chars.

My tests are: I have 3 object, each has different tags. 1, Object-tag:"lorem" - Test tag:"lorem" -> working fine 2, Object-tag:"Lorem" - Test tag:"Lorem" -> no result by default, but if i remove elggstrtolower from the list_entities_from_metadata call in /search/index.php it will come back with 1 result 3, Object-tag:"lorém" - Test tag:"lorém" -> nothing, and i need to remove the elggstrotolower and set the "case_sesitive" variable to "false" to get my object if u need it, i can provide test-site or php/mysql logs.

Changed 11 months ago by brettp

  • cc marcus removed

The problem results in casting to BINARY in get_metastring_id(). Exploring better options.

Changed 11 months ago by alapzaj

Brettp,

i think converting the given string to lowercase, and doing case sens MYSQL query is also a big problem, because all metadata with Capital letters in it, will get excluded from the results. $Binary is only a problem, if the string contains non-english letters, but skipping binary option will not solve the Capital-noncapital problem.

Changed 11 months ago by brettp

  • status changed from new to closed
  • resolution set to fixed

Fixed in [3514].

Changed 11 months ago by alapzaj

Just one more comment on this bug. Sites, with 1.6.1 and case sens enabled will create a new metastring if it contains UTF8 char, and ignore the already exist ones.
As you can see below, i have a string called "konyhaművészet", with id 280.After upgrading to 1.6.1, an user posted a new blog entry, tagged with "konyhaművészet".While creating metadata, add_metastring function will be called, which should check if the metastring already exist.The get_metastring will return false, because of forcing case sens (in the add_metastring function too), so ellg will create a new record.

The problem is, that if someone do a search for "konyhaművészet", even after your fix, the search function will only return with the old entities tagged with the given string(because we have a lot's of metastring records, with "konyhaművészet", but the query will pick up only the first one, due the " limit 1")

So, elgg admins with this problem, need to clear the database after 1.7 :(

Example querys:

mysql> select * from scsbetametastrings where string='konyhaművészet';
+------+------------------+
| id   | string           |
+------+------------------+
|  280 | Konyhaművészet | 
| 3260 | Konyhaművészet | 
+------+------------------+
2 rows in set (0.00 sec)

mysql> select * from scsbetametadata where value_id='280';
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
| id    | entity_guid | name_id | value_id | value_type | owner_guid | access_id | time_created | enabled |
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
|   958 |          43 |      75 |      280 | text       |         20 |         2 |   1248619127 | yes     | 
|  1224 |          74 |     349 |      280 | text       |         20 |         2 |   1248625306 | yes     | 
|  3179 |         373 |     349 |      280 | text       |        214 |         1 |   1249038265 | yes     | 
|  3862 |         598 |     349 |      280 | text       |        582 |         2 |   1249233249 | yes     | 
|  4738 |         839 |     349 |      280 | text       |        377 |         2 |   1249389622 | yes     | 
|  5725 |        1023 |     349 |      280 | text       |        582 |         2 |   1249410640 | yes     | 
|  6451 |        1191 |     349 |      280 | text       |       1113 |         2 |   1249542232 | yes     | 
|  7879 |        1472 |     349 |      280 | text       |        988 |        -2 |   1249827739 | yes     | 
| 26838 |        1752 |     349 |      280 | text       |         47 |         0 |   1255474016 | yes     | 
| 26839 |        1751 |     349 |      280 | text       |          2 |         2 |   1255474016 | yes     | 
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
10 rows in set (0.00 sec)

mysql> select * from scsbetametadata where value_id='3260';
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
| id    | entity_guid | name_id | value_id | value_type | owner_guid | access_id | time_created | enabled |
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
| 26822 |        1750 |     349 |     3260 | text       |         47 |         0 |   1255473809 | yes     | 
| 26823 |        1749 |     349 |     3260 | text       |          2 |         2 |   1255473809 | yes     | 
+-------+-------------+---------+----------+------------+------------+-----------+--------------+---------+
2 rows in set (0.00 sec)
Note: See TracTickets for help on using tickets.