Opened 2 years ago
Last modified 4 months ago
#3481 new Defect
timezone issues make transferring/duplicating installs a hassle
| Reported by: | ewinslow | Owned by: | brettp |
|---|---|---|---|
| Priority: | high | Milestone: | Elgg 1.9.0 |
| Component: | Core | Version: | 1.8 |
| Severity: | major | Keywords: | |
| Cc: | brett@…, steve@… | Difficulty: | moderate |
Description (last modified by brettp)
We've seen it many times in the community, and it just bit me too. Spent way too long trying to figure out something I should have known would be a likely problem. I thought it wouldn't be because I was just transferring from a Pacific server to my localhost, but didn't realize my localhost is actually configured for the Europe/Berlin timezone according to date_default_timezone_get.
I think the dates used to generate the file structure need to be forced to be UTC-based, so they're always consistent no matter where you transfer the files to/from.
Change History (25)
comment:1 Changed 2 years ago by cash
comment:2 Changed 2 years ago by ewinslow
Yea... hassle either way... will try to look into it as I really don't want this to be a problem for our site down the road if anything ever changes.
comment:3 Changed 23 months ago by cash
- Difficulty set to moderate
- Milestone changed from Needs Review to Elgg 1.8.x
- Priority changed from normal to high
- Severity changed from minor to major
comment:4 Changed 18 months ago by cash
- Milestone changed from Elgg 1.8.x to Elgg 1.8.3
We need to tackle this one. If anyone thinks we should wait for 1.9, let's discuss.
comment:5 Changed 17 months ago by ewinslow
Bit me again. Definitely should be addressed sooner rather than later, though my current workaround is simple enough -- force the timezone in engine/settings.php with date_default_timezone_set().
comment:6 Changed 17 months ago by ewinslow
Another thought: perhaps our dir structure shouldn't be using dates at all. As I recall, we changed things so that time_created is now editable. That's bad news for folder structures based on time_created. Seems to me we should be generating the folder structure based on GUIDs, which are reliably persistent and not subject to context differences such as timezones, etc.
comment:7 Changed 17 months ago by cash
I agree on using GUID, We just need to come up with a scheme for dividing the guids up so that we don't get too many users in a single directory.
comment:8 Changed 17 months ago by ewinslow
md5 the GUID and use the first 3-4 characters?
comment:9 Changed 17 months ago by cash
It would also be nice if a non-technical admin could figure out what a user's directory was based on the GUID. Maybe the first 10,000 GUIDs go in the first directory and so on.
comment:10 Changed 17 months ago by cash
- Milestone changed from Elgg 1.8.3 to Elgg 1.8.4
I want to get 1.8.3 out quickly because we discovered bugs with the 1.8.0 upgrades. This is more involved - especially with testing.
comment:11 Changed 17 months ago by ewinslow
How about take the last 4 digits, prepending 0's if needed?
- 13 -> /0/0/1/3/
- 90013 -> /0/0/1/3/
comment:12 Changed 16 months ago by mrclay
- Cc steve@… added
@Cash it'd be simple enough to do this:
/0-4999/ /5000-9999/ /10000-14999/ /15000-19999/ ...
There obviously won't be a zero, but I think it makes the directories clearer to read.
comment:13 Changed 16 months ago by ewinslow
@mrclay, in that case you wouldn't need the explicit upper bounds, I'd think.
comment:14 Changed 15 months ago by cash
- Milestone changed from Elgg 1.8.4 to Elgg 1.8.5
comment:15 Changed 13 months ago by tomv
Little sad we seem to have to change it again...like we did when we got to, what was it 1.5?..
About naming based on guids: what about upside down based on digits?
13 -> 3/1
5413 -> 3/1/4/5
...
Always consistent, easy to read and never more than 10 items and 10 folders per folder...
comment:16 Changed 12 months ago by brettp
- Description modified (diff)
- Owner set to brettp
Looking into this. Ideally what would happen is that filenames themselves would be hashed and distributed across a matrix 5 or so chars deep. A few benefits of this:
- Guarantees even spread of data.
- Removes the unnecessary complexity of a storage matrix based on the owner.
- Inherently more secure because the filenames are saved as hashes.
A con to doing this is that it'd be impossible to know who owns which files by looking at the data directory. Is a non-technical user viewing the files a big enough concern that we should continue implementing an unevenly distributed time-based (guids included) approach?
comment:17 Changed 12 months ago by tomv
- Version changed from 1.7 to 1.8
I dont understand the advantage of hashing the filenames... what if filenames change? Do we have to relocate? What is wrong with using the guid, per digit in reverse order as I suggested above?
comment:18 Changed 12 months ago by brettp
Filenames could be handle a few ways. When you rename a file you're moving it, so we could rehash, or we could use metadata to store the "downloadable" filename (which we do already) and keep the hash the same.
The problem with a guid, time based, or any non-random approach is that it causes uneven distribution of the data. If a single user uploads 2 gigs of data, it's all under that one user's directory. In very large or data-heavy communities who might want to store data across different filesystems, this makes it hard to shard the data directory.
I admit this might be premature optimization, but I'd rather not have to migrate data in 1.8.5, then migrate data again in 1.9.X.
comment:19 Changed 12 months ago by tomv
Ok, I understand, thanks. I thought to use the guid of the file, not of the user...
comment:20 Changed 12 months ago by brettp
There are too many systems that rely on the predictability of the filenames and don't actually save ElggFile objects that we can't use hashes.
For 1.8.5 I'm going to make it guid-based in buckets of 5000 with the dir name the lower bound starting at 1 (because GUIDs start at 1). e.g. 1/, 5001/, 10001/, 15001/, etc.
Hashing is the next step and I've opened a ticket for it: #4523.
comment:21 Changed 12 months ago by brettp
Going back on this. The bucket dir names will start at 1, but will hit 5000, 1000, 15000, etc. 5001 and friends were super awkward. I dislike starting at 0 because it makes no sense.
comment:22 Changed 12 months ago by brettp
This is implemented in a branch on my fork: https://github.com/brettp/elgg/tree/data_to_guids
It will break plugins that access files directly, so I'm teetering on whether this should go in a bugfix release. The only one I know of is TidyPics, which isn't officially released for 1.8. Opinions on putting this in 1.8.5?
comment:23 Changed 12 months ago by ewinslow
Lets wait until 1.9. Its only a few weeks away. Too big and scary a change for just a bug fix release IMO.
comment:24 Changed 12 months ago by brettp
- Milestone changed from Elgg 1.8.5 to Elgg 1.9.0
comment:25 Changed 4 months ago by brettp
1.9 PR for review: https://github.com/Elgg/Elgg/pull/491

This means an ugly upgrade that fails sometimes because of either time limits or permissions problems. Ideally, the upgrade should detect those problems and maybe stop the upgrade?
The actual change in the code is simple: gmdate vs date
I think it needs to be done, just not looking forward to actually doing it and dealing with the all the problems.