|
Why are you trying to paginate though 8 million rows, 10 at a time, (or even 77k)? Seems like you should have some other indexed column to filter that further because that's an unusable amount of data in a UI.
RandomBlue fucked around with this message at 14:33 on Mar 27, 2022 |
# ? Mar 27, 2022 14:30 |
|
|
# ? Apr 25, 2024 03:25 |
|
It’s for a webpage datatable so people can look up their invoices. I think we’ve gotten hung up on the 77k and 8 million number because that’s what our total numbers are in the database and I was using that as a baseline for speed. There is further filtering on the query I’m using, so that the total unpaid result set is less than 10k, and the paid result set is closer to 800k.
|
# ? Mar 27, 2022 14:38 |
|
I take it you're ordering the result set properly so paginating works correctly. Do you have an index that includes the ordered columns first and any other filtered columns after so it can just use the index for ordering? That should allow it to just return the first page very quickly. e: Make sure the columns in the index are ordered (ASC/desc) the way they are in your query. RandomBlue fucked around with this message at 14:46 on Mar 27, 2022 |
# ? Mar 27, 2022 14:42 |
|
The size of the result doesn't really matter. If it takes 40min to compute a complex recursive collection of joins over the result when an invoice is given, it will be slow. IE, the bottleneck isn't that conditional, it's other things in that query. For example, you can fetch every item and it's details associated with every invoice (then hideit all in the UI because no one wants to see 1M rows of detail), or you can fetch the dates+totals for each invoice, show 10k rows of date in the UI (that no one wants to see), and fetch the items when the invoices are expanded. What's the output of EXPLAIN?
|
# ? Mar 27, 2022 22:57 |
|
Thanks for the hints on what to look at, got it sorted out. There was a self-join in there that was forcing a table scan for some reason, rewrote that and now it's snappy. e: lmao, now I have the opposite problem, the large result set is instant, the small result set takes forever. Why did I ever decide to be a computer toucher. Just-In-Timeberlake fucked around with this message at 14:33 on Mar 29, 2022 |
# ? Mar 29, 2022 13:55 |
|
Hello, smart people. I have a massive wordpress multisite I need to get some info out of, and I don't know how to format a query to do this. From all tables containing 'wp_*_options' (* being up to 3 digits), I need to get the rows for these columns in 'option_name': siteurl home blogname admin_email template options_shop_link_url options_phone_number options_facebook_link options_twitter_link options_youtube_link He;lp?
|
# ? Apr 6, 2022 21:42 |
|
pipebomb posted:Hello, smart people. You want a pivot table, I'm iffy on the syntax though.
|
# ? Apr 6, 2022 22:00 |
|
Ok, so this gets me all of the 'option_name' columns in 'wp_*_options'.code:
gently caress, I am so rusty. It's been a long time since I did this poo poo. pipebomb posted:Hello, smart people.
|
# ? Apr 6, 2022 23:32 |
|
Hmm. Maybe I should just dump every table that is ‘wp%options’ and then just filter it in a spread sheet? Like hide all the things I need and delete the rest…the goal is to just toss it into a report anyway.
|
# ? Apr 7, 2022 17:05 |
You'll have to generate dynamic SQL anyway. The language doesn't allow parameterizing table names anyway, as far as I know. If you want a "nice" result, create a temporary table, generate SELECT INTO statements for all the source tables, and store the results in the temporary table. Then run the report on the temp table.
|
|
# ? Apr 7, 2022 21:16 |
|
It's one of those days (for my part)... What? What is the actual table structure? Is option_name a field? Is siteurl a value or a field? You mean you want to SELECT WHERE option_name IN ('siteurl', 'home',...)? Across multiple tables though, you'll need UNION, but are they all part of a proper partition you can query? SQL is a query language and (notoriously) not a reporting language. "Dump these 18 things from these 17 tables and use them to blobnicate the 5 things into Excel keyword Tableau dashboards" isn't really going to be easy. ps vvv Thank you. Yeah based on that it's just going to be a bunch of SELECT statements that look like the WHERE above. There's no wildcard select (afaik). pps If you have backups it might be easier to fetch and scan one of those, depending on the backup format. But yeah "find all tables such that X" is not good when X refers to data in those tables. PhantomOfTheCopier fucked around with this message at 00:17 on Apr 8, 2022 |
# ? Apr 7, 2022 21:26 |
|
To both Phantom and nielsm: Yeah, it's not ideal for what I want to make it do but it seems possible if I can wrap my tiny brain around it. So the db is comprised of multiple tables, all appended with the appropriate site_id. So there is like 'wp_3_mooseknuckle' and 4, 5, 6 - a LOT. In this case, I want to select all the tables named 'wp_%_options', and either: - get all rows whose value in column 'option_name' is one of those listed'. Export it to csv, toss it into spreadsheet and say 'these sites are missing these values'. or - likely simpler, which I was pondering and as nielsm said, just dump every table named 'wp_%_options', export it and clean it up. Could I do SELECT INTO using the 'wp%options' wildcard? I'm going to grab a full dump and do it locally, as the thought of even creating a temporary table on the live db makes my penis crawl up into my abdomen.
|
# ? Apr 7, 2022 22:08 |
|
what are typical values for postgres connection limits and per-connection dedicated memory? obviously depends on your workload blah blah but iirc those numbers are surprisingly low because PG really wants you to have a pool in front so they don't have to worry about supporting a zillion idling connections
|
# ? Apr 8, 2022 04:53 |
|
What documentation suggests that? All large PG instances I've worked with are/were running without external, software pooling or replication management. The only documented "explanation" that's fairly well known is that the packages come ready to run PG... on your phone if need be, with the expectation that values will be configured upward for the hardware and application needs. Just like Apache prefork can run at 1MB/process for a 5000 connection document server, so also can a PG query-only server be configured for lots of short, non maintenance connections. https://www.postgresql.org/docs/current/runtime-config-resource.html But have you tried just thieving the common config values from all the cloud instances?
|
# ? Apr 9, 2022 01:54 |
|
teen phone cutie posted:Not sure if there's a better thread for this, but I have a performance questions re: a personal project I'm working on: What do you people think is the most performant way to do the "unread count" for threads that Something Awful has? Talking about this thing: so i posted in the python thread about this, but now that I have my table schemas for users, threads, and read_count, I'm now looking for a query that will help me join the read count to the thread schema https://forums.somethingawful.com/showthread.php?threadid=3812541&pagenumber=171#post522793982
|
# ? Apr 14, 2022 07:02 |
|
I'm reading the docs you linked and all I can think is that ORMs are a mistake. The SQL you want is something like: code:
|
# ? Apr 14, 2022 07:41 |
|
Jabor posted:I'm reading the docs you linked and all I can think is that ORMs are a mistake. you’d do it literally exactly like that lol only thing is instead of column names you usually write in object names, so if that is threadId in your code you’d write AND t.id = r.threadId, and so on. The other way to do it is to map them as collections and then you can do “inner join t.readCounts r” but you also want to be sure that’s lazy so it doesn’t drag in a bunch of individual rows. Paul MaudDib fucked around with this message at 08:04 on Apr 14, 2022 |
# ? Apr 14, 2022 08:02 |
|
SQLAlchemy has a query builder too in addition to the ORM (the SQL expression language, they call it). You'll probably want to use that, but I'm not familiar enough with it to say exactly how to do it. I think you can start reading here though: https://docs.sqlalchemy.org/en/14/core/tutorial.html
|
# ? Apr 14, 2022 13:28 |
|
Jabor posted:I'm reading the docs you linked and all I can think is that ORMs are a mistake. this is great thank you! I ended up translating it to the ORM by doing this: Python code:
|
# ? Apr 14, 2022 17:51 |
|
Does anyone know why when restoring a database via SSMS to a different database name, sometimes (seems to be based on the individual database) it doesn't auto-rename the files to restore as? For example, one of my databases when restoring I simply change the destination database name and the file names are automatically changed as well: However, with a different database this doesn't happen, I have to manually rename the files when restoring to a different destination database name:
|
# ? Apr 19, 2022 15:14 |
|
I have a db with a table Messages and Users. We will shortly be running a query very frequently to count the number of unread Messages someone (not one of these Users) has, something simple where we get the count after joining Messages to Users, filtering on User.LocationId is in some list of ids, and Message.Read is false. Messages has a foreign key to Users, but User.LocationId is not a foreign key because it is an Id supplied from an external system. I'm not familiar enough with how the db works when using supplemental indexes with joins in the same query: is it enough to ensure that User.LocationId and Message.Read have individual indexes on their tables to avoid unneeded scanning in this query? Using MS SQL Server.
|
# ? Apr 20, 2022 14:42 |
|
Opulent Ceremony posted:I have a db with a table Messages and Users. We will shortly be running a query very frequently to count the number of unread Messages someone (not one of these Users) has, something simple where we get the count after joining Messages to Users, filtering on User.LocationId is in some list of ids, and Message.Read is false. Messages has a foreign key to Users, but User.LocationId is not a foreign key because it is an Id supplied from an external system. Don't bother indexing on Message.Read, index on Message.UserId. Maybe include Message.Read in that index on UserID. https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-ver15 Edit: if you really only care about unread messages, you could even filter your Message.UserId INCLUDE Read for Read = 0. But that may be overkill. How many millions of records are you expecting?
|
# ? Apr 20, 2022 17:09 |
|
Nth Doctor posted:Don't bother indexing on Message.Read, index on Message.UserId. Maybe include Message.Read in that index on UserID. Message.UserId does have an index on account of it also being a foreign key right now, but you think I should add Message.Read into that index so the db doesn't have to get results from the first index and then match with the Message.Read index? Or can I add a supplemental index (Message.UserId, Message.Read) instead of just Message.Read, and that new index would get used for the join and not the index on just Message.UserId? Though probably that would be redundant to have two indexes, one on Message.UserId and another (Message.UserId, Message.Read), I think. Nth Doctor posted:How many millions of records are you expecting? Less than 1 mil on year 1 probably, but the old solution we are replacing has ~10 mil after 5 years. I'm not considering denormalization for it just yet.
|
# ? Apr 20, 2022 17:42 |
|
Having to shuffle things around in an index instead of just flipping the "this has been read" field every time a user reads a message seems like it'd cause more performance issues than it solves. Have you considered tracking "this is the number of unread messages for this user" instead of counting them all up every time you need them?
|
# ? Apr 20, 2022 19:30 |
|
Jabor posted:Having to shuffle things around in an index instead of just flipping the "this has been read" field every time a user reads a message seems like it'd cause more performance issues than it solves. That's a very good point and one I'll need to consider more carefully if I do want to include the Read column into an index. In the future it is likely we will do some sort of denormalized count for those queries, so probably if I stick with ensuring indexes for Message.UserId and User.LocationId I should be alright until then. I am still curious if there is a simple way to approach supplemental indexes when you know the query will also include joins: is adding the additional columns you want to filter onto the existing index for a foreign key a generally better approach than allowing the db to perform the join and the additional predicate filtering with a second index separately? Or perhaps I'm off the mark on how the db would approach a query like that. Any illumination into this topic would be appreciated.
|
# ? Apr 20, 2022 20:37 |
|
Having a second index just on the column you're using for the predicate is of absolutely zero help. The DB can use the index on the foreign key column to find the entries that correspond to the selected users. And it could use the index on the predicate column to identify the entries for all users that meet the predicate. But there's not actually any way to combine those results into "all entries for the selected users that meet the predicate" - it's easier to just ignore the second index and instead look at every entry from the first index one at a time. (Especially if you're talking about low millions of rows - that's not a huge amount of data to churn through).
|
# ? Apr 20, 2022 20:49 |
|
Opulent Ceremony posted:Message.UserId does have an index on account of it also being a foreign key right now, but you think I should add Message.Read into that index so the db doesn't have to get results from the first index and then match with the Message.Read index? An INCLUDED column (not yelling, just using SQL old man syntax to show it's a keyword) has a copy of the values of the included columns stored in the index itself adjacent to the columns by which it is indexing. Then if you're only grabbing that value, SQL Server doesn't need to jump from the index to the row in your table. So your query would be something like code:
If you do go this route (which again is probably overkill, even for the INCLUDE) then making it a filtered index WHERE Read = 0, then every message when it gets added to the message table will first have an entry in the index, up until the record is marked as read at which point it can fall out of the index you built just for this query. But honestly: for low millions, don't bother even with the INCLUDE. It's just a handy feature to bring some actual values along for the ride when you're making an index. Nth Doctor fucked around with this message at 21:25 on Apr 20, 2022 |
# ? Apr 20, 2022 21:17 |
|
I need some basic advice on how to search text columns before I implement anything. I'm going to start a personal project where I'll have a SQL DB that contains a Caption table, and the table will contain a column for CaptionText. The idea is that users will be able to search through the captions of YouTube videos to find specific quotes. It's just a personal project so I don't want to make it as complex as a Google search, but I don't want it to be as basic as a LIKE '%text%' query in case the user can't remember exactly how the quote goes, or in case the auto-generated caption isn't perfect. It seems like I should look into Full-Text Search, but does anyone have any other ideas?
|
# ? Apr 20, 2022 21:48 |
|
Nth Doctor posted:Helpful stuff Thank you for this. Jabor posted:The DB can use the index on the foreign key column to find the entries that correspond to the selected users. And it could use the index on the predicate column to identify the entries for all users that meet the predicate. But there's not actually any way to combine those results into "all entries for the selected users that meet the predicate" - it's easier to just ignore the second index and instead look at every entry from the first index one at a time. Does that mean that the only time a supplemental index is of help on a query with joins is if it's worth first filtering a table based on the secondary index, then using the foreign key indexes for the joins, or does that option not exist either? Obviously it would be bad in the case of the query I was talking about, but I'm hoping to understand my (and the db's) options here.
|
# ? Apr 20, 2022 22:02 |
|
Opulent Ceremony posted:That's a very good point and one I'll need to consider more carefully if I do want to include the Read column into an index. In the future it is likely we will do some sort of denormalized count for those queries, so probably if I stick with ensuring indexes for Message.UserId and User.LocationId I should be alright until then. You're asking some specific questions for your specific problem, but if you've got the time and the interest this is probably worth reading a whole-rear end book about.
|
# ? Apr 20, 2022 22:20 |
|
Opulent Ceremony posted:Does that mean that the only time a supplemental index is of help on a query with joins is if it's worth first filtering a table based on the secondary index, then using the foreign key indexes for the joins, or does that option not exist either? Obviously it would be bad in the case of the query I was talking about, but I'm hoping to understand my (and the db's) options here. The answer to this is unfortunately very mushy. There's an element of nondeterminism in how the query planner turns your query into an execution plan. What you hope is to pull together rows from the smallest table and join outwards to progressively larger tables, especially when the columns on which you're joining are either the clustered index (in SQL Server, the physical ordering of records on disk, think flipping to a specific page in a book) or a nonclustered index (think hitting the index of a book to get the page numbers for where the author discusses the topic of farts). The query planner uses statistics and even active server load to guide this creation of the execution plan and sometimes it is just a stupid dumb idiot that tries to cross join a billion rows. There's not a great one-size-fits-all solution to force the server to go in what you the human decides is the best order. Sometimes the query planner will decide to use some wholly unrelated index for whatever reasons it keeps to itself. There's an art to designing your schema, queries, and indexes in a way that supports the query planner generally performing its best, but even then: sometimes it's just moronic and you need to get create with hints or as a last resort: poo poo like OPTION(FORCE ORDER).
|
# ? Apr 20, 2022 22:20 |
|
Forgive me if I've misunderstood the question, but regarding unread message counts, have you read the previous few pages of this thread where we went through most of the options?
|
# ? Apr 20, 2022 23:31 |
|
PhantomOfTheCopier posted:Forgive me if I've misunderstood the question, but regarding unread message counts, have you read the previous few pages of this thread where we went through most of the options? I did see that discussion and appreciate it. I was hoping for some conversation specifically about how a db can or can't manage join queries that also have filters on supplementary-indexed columns, and I appreciate the answers people have given me as well.
|
# ? Apr 20, 2022 23:49 |
|
Anyone know why the backups for one of my databases has suddenly shrunk by a lot? This is SQL Server and the option for auto shrink is turned off. I don't see any indication that the data has been affected in any way. Everything is still there.
|
# ? Apr 28, 2022 16:04 |
|
kiwid posted:Anyone know why the backups for one of my databases has suddenly shrunk by a lot? This is SQL Server and the option for auto shrink is turned off. Are those diff backups or full? Compression settings? Though that seems like a lot.
|
# ? Apr 28, 2022 16:16 |
|
Just-In-Timeberlake posted:Are those diff backups or full? Compressed vs not compressed: Edit: Nevermind, I forgot I had removed some debugging tables from this DB. kiwid fucked around with this message at 18:46 on Apr 28, 2022 |
# ? Apr 28, 2022 16:57 |
|
I'm pretty sure the answer is 'no' as I can't find it in the docs, but just in case: does postgres have any built-in functions or constants to get the min/max allowable values for a given numeric type? Like int_max would be 2^31-1.
|
# ? May 6, 2022 16:35 |
|
NihilCredo posted:I'm pretty sure the answer is 'no' as I can't find it in the docs, but just in case: does postgres have any built-in functions or constants to get the min/max allowable values for a given numeric type? Like int_max would be 2^31-1. Second, yes, it does, but what's important is that columns can be of many different types, up to and including custom types, so at the very least it's necessary to consider "the allowable values for a column". This includes constraints as well as type; take a look at https://www.postgresql.org/docs/current/catalog-pg-constraint.html and https://www.postgresql.org/docs/14/catalog-pg-type.html
|
# ? May 7, 2022 03:50 |
|
I have table A with the following relations:code:
Something like code:
I don't need the actual data in any of the other tables so we can take some shortcuts in joins as desired. I've tried a variety of ways to do this and I'm stumped on this one. It doesn't have to be all one query so long as it's reasonably efficient (like a couple of queries). Does any of that make sense? Jaded Burnout fucked around with this message at 21:59 on May 9, 2022 |
# ? May 9, 2022 21:47 |
|
|
# ? Apr 25, 2024 03:25 |
|
Not sure if I entirely understand your explanation but if you have, for the sake of simplicity, 3 tables: Orders Order_Items Shipped_Items You could INNER JOIN Orders and Order_Items, and LEFT OUTER JOIN the Shipped_Items on the Order_Items table. If the item hasn't shipped the value for any column in Shipped_Items will be NULL. It'd be something like this, adjust as necessary for your situation code:
|
# ? May 9, 2022 22:48 |