New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch the loading of metadata during entity list rendering (Trac #4290) #4290
Comments
ewinslow wrote on 42066506-06-29 I like the idea of bulk loading metadata but I'd vote we just batch load all metadata always when fetching a list of entities (respecting access controls of course). Memory consumption should not be prohibitive here, and this cuts the maximum number of queries out. It's also far simpler to implement. We could try doing smart things like if an entity's metadata is already loaded we just use that instead of pulling it in twice, but I'd say that's a different ticket. |
Milestone changed to |
ewinslow wrote on 42419165-09-17 In terms of implementation, we would need a way to set metadata on an entity without triggering a db request (i.e. disable auto-save). Either that or we load the metadata into a cache and then reconfigure ElggEntity's metadata getter to hit the cache first, then try the DB. I'm partial to the second option because we could do it without changing the API (i.e. in 1.8.x), and such a cache stands to get us some big performance improvements across the board. |
trac user mrclay wrote on 42419430-06-21 2nd option sounds far better. I think the MetadataCache should sit on the entity (so it gets auto dumped if the entity is removed from the entity cache) and should also cache failed lookups (basically if a key is requested and false/null is returned, that should be cached), because these often occur multiple times within a request. I assume that, on save, the saved metadata value(s) will be put back in the read cache? |
trac user mrclay wrote on 42419986-11-22 Just spent a few hours on [https://github.com/mrclay/Elgg-leaf/compare/Elgg:1.8...mrclay:4290-md-cache this commit], just to realize that, since the global metadata fetch and delete functions are in the public API and can be called directly, there's no way to invalidate/replace cached values if I store metadata on the entity object. Another case where the global API forces other things to be global :(. But at least the mission is clearer, while |
trac user mrclay wrote on 42420029-12-15 My new work branch on this is [https://github.com/mrclay/Elgg-leaf/compare/Elgg:1.8...mrclay:4290-md-cache-2 here]. Just need to plug it into all public API points for MD read/write...ugh. |
trac user mrclay wrote on 42420982-12-30 Made significant progress on [https://github.com/mrclay/Elgg-leaf/compare/Elgg:1.8...mrclay:4290-md-cache-2 my branch] tonight. All fetched metadata is now stored, and delete/disable invalidate as little of the cache as possible. When there's any uncertainty about future DB reads, the cached values are cleared. All (current) tests passing! The next step is prefetching all the metadata from inside |
trac user mrclay wrote on 42425733-08-12 I think this is as coupled as the metadata cache should get to system. I.e. I think Part of the reasoning is to stop shoehorning functionality into When querying to pull all the MD, it will be important not to use a LIMIT clause. If you do , you never know if there are MD values that were not loaded, so you cannot assert that the cache is accurate for any key. It may be wise, before pulling data, to first issue a query to determine how many values exist for each (guid|name) combination and character counts on TEXT values (not sure if this is practical). This may let us decide to leave some data out of the auto-populate cache to save memory. Just being paranoid here. |
trac user mrclay wrote on 42426260-09-10 Latest [https://github.com/mrclay/Elgg-leaf/commit/e43dee3ebed42b502488c99851e82b6ccff0ffed commit] has function to populate cache based on GUIDs. Only needs two queries, but neither has a LIMIT and the first uses a GROUP BY to count the bytes of all the values in each entity to ensure it isn't over (arbitrary for now) 1MB. Tests included. if this looks good we can figure out what in what situations this function should be called and whether directly/by hook. |
ewinslow wrote on 42426395-05-15 Thanks for the work. Left some comments on the commit. Did I send a big long message to you explaining my thoughts on the issues you raised? I don't see it in this thread... |
ewinslow wrote on 42431651-12-29 The tl;dr is that the memory usage here is almost certainly trivial, not something we should worry about until we have hard evidence there is a problem. |
trac user mrclay wrote on 42433622-02-10 Great, I'll refactor this a bit based on your suggestions. Thoughts on where/how to call what will be |
trac user mrclay wrote on 42434942-05-12 Unless I'm missing something, [https://github.com/mrclay/Elgg-leaf/compare/Elgg:1.8...mrclay:4290-md-cache-2 this branch] completes this ticket. Cache has its own test case, is run with the core units (all passing), global helper functions moved into cache class, some tiny bugs fixed. Cache is integrated in |
ewinslow wrote on 42435926-04-19 Took a look at your branch. Made a few extremely minor comments. I assume you've also poked around on a test site just as a sanity check? If everything looks like it's good to go, I'll be very pleased to pull this into 1.9. Also, just for the record, I found that post I was referring to: tl;dr: I'm pretty strongly convinced that the extra memory usage here will be trivial. I had this same concern at first but after thinking through the numbers, I'm not really convinced this is an issue for 99% of Elgg sites. The entire Elgg community has a metastrings table that is ~50MB. The average size of a metastring is ~214 bytes. But I'm betting you that's from long forum posts and comments which should be entities. So the average is going to plummet once fewer metadata fields are actually freetext. But let's just say 5 pieces of metadata take up ~1KB. That means you could load 5000 of those on a page and still be only using ~1MB of memory. That would require a page of 250 entities with 20 metadata each. In my experience, a bare-bones Elgg page uses about 12MB. For the potential performance gains, this seems like a no-brainer tradeoff. We can continue to refine our implementation to make it as memory-efficient as possible, but my thought is that if it provides more responsiveness for end users, then everyone will be praising us for making the right choice and we can deal with any edge-case too-much-memory-usage problems later. Note that nothing stops us from implementing this naive solution now and refining it when/if it actually becomes a problem. Finally, for those few sites that might actually be negatively impacted, we can always provide an opt-out configuration option in settings.php or something. |
Milestone changed to |
trac user mrclay wrote on 42491530-02-23 Rebased to 1.9 #285 |
trac user mrclay wrote on 42678062-02-06 Fresh PR for 1.8: #377 Should I reopen? |
Milestone changed to |
…ading for fetched entities
Fixes Elgg#4290: adds preloading volatile metadata cache and unit tests
Original ticket http://trac.elgg.org/ticket/4290 on 42030340-11-12 by trac user mrclay, assigned to trac user mrclay.
Elgg version:
When displaying a list of entities, particularly of the same type/subtype, it would be wise to attempt to autoload a metadata key for all entities in the list whenever an unloaded key is requested.
E.g.
elgg_list_entities
is asked to render some blog posts. In the first view rendering,$entity->foo
was already pre-loaded (by some other mechanism) but$entity->bar
has not been. This, I think, makes it extremely likely that thatbar
is going to be needed for all entities in the list, so they all should be loaded at that time in as few queries as possible.In practice, something like the context stack would be needed, but holding lists of entities being currently rendered (and a comma-separated list of their GUIDs for use in queries).
What I imagine complicates batch-loading metadata is that some entities could have
bar
as an int, some others as an array of strings, etc. but I imagine there's still big opportunity for reducing queries.The text was updated successfully, but these errors were encountered: