|
Pollyanna posted:Turns out we didn’t need that index at all. The _new_ problem is that somewhere in our massive collection of crap is a string with a null character in it - and I have no idea where. All my attempts at dumping the collection and grepping for \0 have failed. What’s the easiest way to search through a collection and find any fields that gave a string with a null character in it? Depends on the tech you’re using. Sounds like you want to generate an index over whatever element/field you want to search and then search it for null values once it’s done
|
# ? Jul 10, 2019 22:27 |
|
|
# ? Apr 29, 2024 07:20 |
|
I have no idea what field it is, I just know there’s a NUL in a string somewhere, and AWS DMS completely dies and refuses to continue processing when it comes across that. If there’s a way to search through an entire document, that’d be great.
|
# ? Jul 10, 2019 22:29 |
|
so it could be any element anywhere that’s null and that’s what you want to find? don’t want to sound like an evangelical but any time you’re looking for a needle in a haystack that’s when you want marklogic. you could run a wildcard in the element name for a value of null and see what pops out
|
# ? Jul 10, 2019 22:37 |
|
Oof, we don’t have time or capacity to bring on an enterprise solution. I’ll prolly have to bug AWS for help on this or something. If only their error messages were better.
|
# ? Jul 10, 2019 22:42 |
|
yeah we’re not cheap. if that type of problem is something you frequently run into though you should consider us
|
# ? Jul 10, 2019 22:45 |
|
Our problems are way more than just technical in nature
|
# ? Jul 10, 2019 22:54 |
|
So I have a mongodump bson file and I can replicate the “insertion error” by trying to restore with it, but I have no idea how to get mongorestore to tell me what it failed on. It just says “Failed insertion error” and nothing else. Is there a way to get the logging to, for example, output the ID of what it failed to insert?
Pollyanna fucked around with this message at 21:55 on Jul 11, 2019 |
# ? Jul 11, 2019 21:48 |
|
Think I found it. Protip: NUL is specified as %00 in URLs. Honestly gently caress it, I don’t wanna try and sort through those, the stakeholder can deal with a backup.
|
# ? Jul 11, 2019 22:11 |
|
Is there any reason why a lookup on a field would fail to use an index when it exists? Reason I ask is that we have a Monge database and an AWS DocumentDB database in parity, with the same indexes, and a query on one field in Mongo will use the index, but the same in DocumentDB will not.code:
|
# ? Sep 19, 2019 22:24 |
|
Pollyanna posted:Is there any reason why a lookup on a field would fail to use an index when it exists? Reason I ask is that we have a Monge database and an AWS DocumentDB database in parity, with the same indexes, and a query on one field in Mongo will use the index, but the same in DocumentDB will not. DocumentDB is backed by a different storage engine than MongoDB, and likely has different limitations around indices. Because DocumentDB is proprietary, you'll have to ask Amazon to know for sure.
|
# ? Sep 20, 2019 00:22 |
|
Arcsech posted:DocumentDB is backed by a different storage engine than MongoDB, and likely has different limitations around indices. Because DocumentDB is proprietary, you'll have to ask Amazon to know for sure. gently caress. Alright, I’ll have to contact them. I’ll see if I can’t figure out a workaround in the meantime. EDIT: As far as I can tell, it simply won't use an index if it has to do anything other than a straight-up equality comparison. So, we can't do less-than queries in a performant manner. Eugh. Pollyanna fucked around with this message at 01:32 on Sep 20, 2019 |
# ? Sep 20, 2019 00:29 |
|
Arcsech posted:DocumentDB is backed by a different storage engine than MongoDB, and likely has different limitations around indices. Because DocumentDB is proprietary, you'll have to ask Amazon to know for sure. “Backed by a different storage engine” is a significant understatement. It’s an entirely different thing that just implements the same API.
|
# ? Sep 20, 2019 06:10 |
|
Pollyanna posted:EDIT: As far as I can tell, it simply won't use an index if it has to do anything other than a straight-up equality comparison. So, we can't do less-than queries in a performant manner. Eugh.
|
# ? Sep 20, 2019 12:53 |
|
Star War Sex Parrot posted:Can you define index types? It sounds like it’s using a hash index instead of a range index like a B+ tree, but I’ve never used DocumentDB. No - DocumentDB only supports Single Field, Compound, and Multikey indexes, and the latter two are done by a workaround of some sort. https://docs.aws.amazon.com/en_pv/documentdb/latest/developerguide/mongo-apis.html in the Indexes section. Now that I’ve slept on it, I’m less pissed, but this is still frustrating and confusing. We’re going to deprioritize this query weirdness and make sure all the other queries are still okay before tackling this one. Pollyanna fucked around with this message at 15:39 on Sep 20, 2019 |
# ? Sep 20, 2019 15:25 |
|
Alright, I think I’m just missing some understanding on how this poo poo works. Lemme start over. I have a collection of documents, about 27 million large. Each document has a time_start field and a time_end field, both dates. We want to query for the following: 1. time_start is less than a given datetime, AND 2a. time_end is greater than another given datetime, OR 2b. time_end is not present in the given document How should I define the index on these documents given that I want to make this query? As I understand it, I would need a compound index on time_start and time_end, since I’m searching for them at the same time. Basically the following index: code:
Am I using the wrong index? Are there tweaks I need to make? Or does this query genuinely just take that long? Edit: some more background info. The other queries we do on this collection involve matches against some other fields, also indexed, and sorting by time_start descending and time_end both ascending and descending, and in at least one case, sorting one of the other fields descending. We also do range queries on time_start being between two different dates. Pollyanna fucked around with this message at 18:08 on Sep 23, 2019 |
# ? Sep 23, 2019 17:05 |
|
New kid on the block: https://www.fusiondb.com/
|
# ? Oct 3, 2019 20:30 |
|
Oh, that was a fun game. Needed more polishing though.
|
# ? Oct 3, 2019 20:37 |
|
There's almost nothing there, but this dude seems to be a frequent contributor to eXist-db.
|
# ? Oct 3, 2019 21:06 |
|
Rex-Goliath posted:New kid on the block: https://www.fusiondb.com/ Somebody shoot their web guy. A paragraph of meaningless marketing babble, aaaand six paragraphs about the lovely license. Nobody gives a gently caress about the dumb restrictive license, and the fact that it can query json is meaningless. *ALL* currently existing databases I'm aware of can do that. The page doesn't give me a single reason to want to use it, and a few reasons why I wouldnt. duck monster fucked around with this message at 02:29 on Oct 8, 2019 |
# ? Oct 8, 2019 02:26 |
|
Rex-Goliath posted:New kid on the block: https://www.fusiondb.com/ what does this offer exactly?
|
# ? Oct 8, 2019 03:10 |
|
abelwingnut posted:what does this offer exactly? I have no clue!
|
# ? Oct 8, 2019 03:15 |
|
So it's a 15th competing standard then?
|
# ? Oct 8, 2019 06:30 |
|
duck monster posted:Somebody shoot their web guy. A paragraph of meaningless marketing babble, aaaand six paragraphs about the lovely license. Nobody gives a gently caress about the dumb restrictive license, and the fact that it can query json is meaningless. *ALL* currently existing databases I'm aware of can do that. The page doesn't give me a single reason to want to use it, and a few reasons why I wouldnt. But it's "100% eXist-db API Compatible"! Which might mean something if anyone had a clue what "eXist-db" was. Edit: Wikipedia posted:eXist-db provides XQuery and XSLT as its query and application programming languages Arcsech fucked around with this message at 17:00 on Oct 8, 2019 |
# ? Oct 8, 2019 16:58 |
|
xQuery is good on MarkLogic but it also uses its own proprietary compiler so ¯\_(ツ)_/¯
|
# ? Oct 8, 2019 17:03 |
|
Putting in a 👎 and a 🖕 for AWS DocumentDB. Performance has been way worse than our own Mongo solution and it can barely handle any query more complex than matching on a single field. Its performance is also wildly inconsistent, we’ve had performance tests/comparisons range from somewhat worse to 4x worse. Do not use. I’ll post again later with details on our use case, to avoid the X-Y problem.
|
# ? Oct 18, 2019 16:08 |
|
Okay, so. We have several collections of documents with up to 69 (nice) fields each. A large subset of these fields are either null or hold an array of acceptable values for that given field (e.g. age_of_car has [0, 1, 2] representing the age of the car in years). In our program flow we have a set of data for a single instance of our data type (a potential insurance buyer) that has attributes for all these 69 fields. We want to find all documents in one of these collections that match to this data type instance, where “match” means that for each document field, the data type either has a value in the field’s array, or the document’s field is null. We accomplish this by constructing a large query for each one of those 69 fields: code:
|
# ? Oct 18, 2019 18:08 |
|
Pollyanna posted:Okay, so. We have several collections of documents with up to 69 (nice) fields each. A large subset of these fields are either null or hold an array of acceptable values for that given field (e.g. age_of_car has [0, 1, 2] representing the age of the car in years). In our program flow we have a set of data for a single instance of our data type (a potential insurance buyer) that has attributes for all these 69 fields. Are there any particular issues you’re running into with Mongo that drove you guys to investigate DocumentDB?
|
# ? Oct 18, 2019 19:31 |
|
PIZZA.BAT posted:Are there any particular issues you’re running into with Mongo that drove you guys to investigate DocumentDB? We have to manage it ourselves, which have led to (nonspecific problems in the past that I wasn’t around for but currently don’t seem to be a problem). Plus, nobody but us uses Mongo, and the only Mongo expert we had left last year, and DocunentDB has people devoted to it, right? Other than that, nothing that DocumentDB would solve. It’s all problems with our overall architecture/solution (latency issues, lack of mongo knowledge, the two databases going out of sync - we mirror our postgres DB in Mongo because ?????????). I personally think our time is better spent removing a second database from the picture entirely, but I’m not the manager here tl;dr: we don’t wanna manage our own boxes but Atlas is too expensive.
|
# ? Oct 18, 2019 20:29 |
|
TBH it doesn't feel like switching from one thing that nobody at the company knows about to a proprietary thing that nobody at the company knows about is going to fix the issue. There are valid reasons to mirror a DB if the data needs to be represented or queried in a way that one of the DBs isn't capable of. That might have been the original reason for copying into MongoDB. But it's much more likely to have been done for resume padding reasons, and if they left it sounds like it worked! From what it sounds like, I'd try to consolidate into Postgres unless there's something very specific that Postgres can't accomplish (even with a plugin or something), since I imagine that's easier to hire for if nothing else. Sorry nosql thread.
|
# ? Oct 18, 2019 21:05 |
|
Yeah this really does sound like your architect is making it more complex for the sake of it.
|
# ? Oct 18, 2019 21:41 |
|
What architect
|
# ? Oct 18, 2019 21:43 |
|
Pollyanna posted:Putting in a 👎 and a 🖕 for AWS DocumentDB. Performance has been way worse than our own Mongo solution and it can barely handle any query more complex than matching on a single field. Its performance is also wildly inconsistent, we’ve had performance tests/comparisons range from somewhat worse to 4x worse. Do not use. I am shocked, shocked I say, that a thing Amazon built in a rush solely as a "gently caress you" in response to a company changing their license so AWS couldn't rip off their poo poo for free is bad. Just shocked.
|
# ? Oct 18, 2019 22:58 |
|
Cross-posting my response to Pollyanna's situation because lolStar War Sex Parrot posted:so you're duplicating the data from postgres so you can do filters on a nosql system built on top of postgres? nice
|
# ? Oct 19, 2019 00:17 |
|
Pollyanna posted:i hate it and it needs to die
|
# ? Oct 19, 2019 03:29 |
|
Hello folks. I'm putting together a small app which's data are very much a graph, a "simple directed graph" to be precise. You mentioned Neo4j might be the current forerunner upthread? I don't particularly care about, well, anything, really, but I'd like to optimise for developer expediency and clarity. The frontend is React, and I would've thought Facebook would have some good support for working with graph data, but apparently nothing obvious, even GraphQL isn't really all that graphy. Backend is Rails. Current main DB is postgresql. Hosting is Heroku.
|
# ? Oct 26, 2019 13:25 |
|
Out of curiosity, what kind of data is this? What does it represent? I was always unsure of what non-relational data is or looked like.
|
# ? Oct 26, 2019 17:45 |
|
Pollyanna posted:Out of curiosity, what kind of data is this? What does it represent? I was always unsure of what non-relational data is or looked like. A dependency graph among a set of tasks, with tasks depending on one another arbitrarily. The only rule being no loops. So I suppose in that case it's a DAG, but not just a tree or set of trees, since there can be more than one task that depends on a single other task. After some more poking around I'm going to try neo4j with neo4jrb.
|
# ? Oct 26, 2019 18:38 |
|
What's an easy free way to host a 5 gig JSON file for one person at a time to query? It's just the Yelp Academic Dataset for a personal demo for a school project. I'm looking at some free mongodb hosts like MongoDB and Heroku but their specs just list RAM sizes (all less than a gig at the free tier) and don't seem to come with, like, a hard drive to pull a big file from.
|
# ? Feb 12, 2020 09:19 |
|
Dumb Lowtax posted:What's an easy free way to host a 5 gig JSON file for one person at a time to query? It's just the Yelp Academic Dataset for a personal demo for a school project. does it need to be json? how big would it be if you shredded it into a more useful/queryable form?
|
# ? Feb 12, 2020 09:58 |
|
|
# ? Apr 29, 2024 07:20 |
|
Both AWS and Azure will give you $200 worth of services for free per year on their educational plans, which I'm assuming just means to every .edu email address. $200 should buy you plenty of S3/blob storage for a class to bang on. e: oh, duh, you'd probably want to make that queryable. IDK how much NoSQL that'd buy you on either platform but I'm sure It Depends™ Munkeymon fucked around with this message at 15:32 on Feb 12, 2020 |
# ? Feb 12, 2020 15:21 |