Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usernames allow accented characters but URLs not working (Trac #4190) #4190

Closed
elgg-gitbot opened this issue Feb 16, 2013 · 14 comments
Closed
Labels
Milestone

Comments

@elgg-gitbot
Copy link

Original ticket http://trac.elgg.org/ticket/4190 on 41933393-01-01 by cash, assigned to unknown.

Elgg version: 1.8

Create a user with an umlaut in the username to reproduce this. I'm not sure how we can fix this without sprinkling the code with urlencode()'s. Other option is not allowing these characters in usernames.

@elgg-gitbot
Copy link
Author

trac user fgreinus wrote on 41937478-12-13

I think not allowing those characters in Usernames is the worst option.
There must be a way to fix this.
Where would you suggest to put urlencode? Maybe I could try it live on my site if you suggest some changes ;).

@elgg-gitbot
Copy link
Author

trac user fgreinus wrote on 41937881-10-10

I commited a Quick&Dirty patch at github, that fixed the problem (momentary) for me:

https://github.com/fgreinus/Elgg/commit/7c7d801393cbe6952032416a668621e58ad5dea9

@elgg-gitbot
Copy link
Author

cash wrote on 41940199-05-13

The core problem is that we're producing invalid URLs because of the non-ascii characters. This causes the URL to fail this: filter_var($url, FILTER_VALIDATE_URL)

To include non-ascii characters in a URL, they have to be encoded. As an example, in views/default/page/layouts/content/filter.php, we would need to change

$username = elgg_get_logged_in_user_entity()->username;

to

$username = urlencode(elgg_get_logged_in_user_entity()->username);

@elgg-gitbot
Copy link
Author

cash wrote on 41940203-09-27

Long term solution is probably to add a method to ElggUser:

public function getUsername($encode = false) {
    if ($encode) {
        return $this->username;
    } else {
        return urlencode($this->username);
    }
}

@elgg-gitbot
Copy link
Author

Milestone changed to Elgg 1.8.2 by cash on 41940203-09-27

@elgg-gitbot
Copy link
Author

ewinslow wrote on 41953065-05-27

Why not convert the accented characters to non-accented? umlauts/accents in names makes sense, but I don't think it's necessary for usernames. Should we be allowing cjk characters in usernames as well?

@elgg-gitbot
Copy link
Author

trac user Purus wrote on 41953223-04-12

The problem is not just with english accented characters. The problem comes when using non english languages too.. For example my Tamil language.

For example try to have the values "தமிழ்" in both username and display name.

@elgg-gitbot
Copy link
Author

cash wrote on 41954441-07-31

Evan, we would need to either require usernames to be only ascii characters or allow all national characters. There's no good middle ground. I'll check through the history of what we have allowed. I don't know if we suddenly started allowing more character types with Elgg 1.8 or something else changed to cause these issues.

@elgg-gitbot
Copy link
Author

ewinslow wrote on 41954789-12-08

tl;dr: I vote [a-z0-9.]+ usernames only.

I think we should take the conservative approach right now and only allow alphanumeric usernames. Even Facebook doesn't allow "non-romanized" usernames. I'm also thinking of things like federation which could be affected by this.

I don't think this is ideal, but it closes off a lot of potential problems and we can always open things up later when we have done more research and come up with a viable solution. On the other hand, if we allow all national characters outright, we may end up regretting that down the line with no good way to recover.

@elgg-gitbot
Copy link
Author

brettp wrote on 41955463-08-22

What about [_-@]? Those are fairly common and arguably useful for usernames.

@elgg-gitbot
Copy link
Author

cash wrote on 41959352-03-12

Elgg has supported non-ASCII characters in usernames since 1.0 as far as I can tell. What has changed is that we added elgg_normalize_url() which uses PHP's URL validator. That validator works with URLs that conform to RFC 3986 (Uniform Resource Identifier). That allows for alphanumeric, hyphen, period, underscore, and tilde.

Before 1.8, we were producing invalid URLs according to RFC 3986, but they worked on probably every browser/server.

There is also RFC 3987 (Internationalized Resource Identifiers). Here is a good article that describes the relationship between URIs and IRIs: http://www.w3.org/International/articles/idn-and-iri/

Here is a list of what IRIs allow as characters in the path:

   iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

   ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD

Anyway, it seems reasonable to modify elgg_normalize_url() to work with IRIs since that is what we've been producing since 1.0. If we want to discussing limiting username character sets, that should be a separate ticket.

@elgg-gitbot
Copy link
Author

ewinslow wrote on 41960266-01-13

Ok, good distinction to make.

@elgg-gitbot
Copy link
Author

trac user Cash Costello wrote on 41961962-02-22

Fixes #4190 accepting full urls with non-ascii characters
Changeset: c529671

@elgg-gitbot
Copy link
Author

trac user fgreinus wrote on 41962488-11-14

Replying to [comment:13 Cash Costello]:

Fixes #4190 accepting full urls with non-ascii characters
Changeset: c529671

Runs perfectly for me, big thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant