Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
PIZZA.BAT
Nov 12, 2016


:cheers:


couchbase should be able to handle that just fine

Adbot
ADBOT LOVES YOU

spiritual bypass
Feb 19, 2008

Grimey Drawer
MongoDB Atlas has a pretty big free tier

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?
write a tiny bit of code to import the repetitive, implicit-schema JSON bullshit into a real database, even just SQLite, and do your queries against that

it’ll take you barely any time at all to pull the data in, then you can create some indexes and go to town

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?
LOL the Yelp dataset isn’t even implicit schema

I should write something to slurp this into SQLite just to show how terrible tossing data like this around as JSON is

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?
oh my god

code:
checkin.json
Checkins on a business.

{
    // string, 22 character business id, maps to business in business.json
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg"

    // string which is a comma-separated list of timestamps for each checkin, each with format YYYY-MM-DD HH:MM:SS
    "date": "2016-04-26 19:49:16, 2016-08-30 18:36:57, 2016-10-15 02:45:18, 2016-11-18 01:54:50, 2017-04-20 18:39:06, 2017-05-03 17:58:02"
}
like come on if you’re using loving JSON anyway just use a loving array for fucks sake

gently caress

there are legit things that are difficult to structure in a relational database but only one thing in this schema gets anywhere close to that

Happy Thread
Jul 10, 2005

by Fluffdaddy
Plaster Town Cop

eschaton posted:

write a tiny bit of code to import the repetitive, implicit-schema JSON bullshit into a real database, even just SQLite, and do your queries against that

it’ll take you barely any time at all to pull the data in, then you can create some indexes and go to town

Would I be able to do those queries online (not locally)? How hard would this be compared to just hosting the JSON somewhere, if I am completely unfamiliar with the types of software used for SQL, and don't remember anything at all from my databases class?

Even with those limitations I've otherwise managed to host small websites using mlab (free mongodb host) and storing nothing but json strings in it. But a 5gig file I'm not so sure that works for. At least mlab won't go that big.

Happy Thread fucked around with this message at 09:24 on Feb 14, 2020

PIZZA.BAT
Nov 12, 2016


:cheers:


eschaton posted:

write a tiny bit of code to import the repetitive, implicit-schema JSON bullshit into a real database, even just SQLite, and do your queries against that

it’ll take you barely any time at all to pull the data in, then you can create some indexes and go to town

Please don’t be that guy itt, thanks

PIZZA.BAT
Nov 12, 2016


:cheers:


Dumb Lowtax posted:

Would I be able to do those queries online (not locally)? How hard would this be compared to just hosting the JSON somewhere, if I am completely unfamiliar with the types of software used for SQL, and don't remember anything at all from my databases class?

Even with those limitations I've otherwise managed to host small websites using mlab (free mongodb host) and storing nothing but json strings in it. But a 5gig file I'm not so sure that works for. At least mlab won't go that big.

Yeah 5 GB is pushing it if you’re looking for free hosting. The only thing I can think of is DynamoDB which has 25 GB of free hosting but I also have zero experience with that.

Honestly with the fact you’re *just* above the free tiers in demand you may just want to bite the bullet and pay for something. I doubt it will cost you more than a few bucks.

Happy Thread
Jul 10, 2005

by Fluffdaddy
Plaster Town Cop
Oh, that's good too! As long as I can cancel when needed and don't get rounded up to some plan tier that assumes and charges for significant traffic. I'll check the pricing pages

PIZZA.BAT
Nov 12, 2016


:cheers:


I know this shouldn’t be anything new to anyone itt but here’s a pretty thorough takedown of Mongo if anyone needs to talk their management out of sticking their dick in that mousetrap: http://jepsen.io/analyses/mongodb-4.2.6

Just-In-Timeberlake
Aug 18, 2003
So this is the closest thread I can find that has to do with elastic search, I have a couple of questions that I can't find a definitive answer to regarding arrays and searchability.

I'm looking to offload some invoice data to elastic for searching purposes, and I have a couple of fields that I want to represent as a simple array.

One would be a payment confirmation #, and a customer could potentially pay an invoice off in more than one payment. So the my plan for the field would be for it to look like this "confirmation_numbers": [12345, etc]

Is that a searchable field like that or do I need to mark it nested?

Related to that, I then want an array field with all the payment dates, so something like "payment_dates": ["date1", "date2", etc]

A. is that field able to be defined as a date type, and B) is that field searchable using "from" and "to" with that layout.

Arcsech
Aug 5, 2008

Just-In-Timeberlake posted:

So this is the closest thread I can find that has to do with elastic search, I have a couple of questions that I can't find a definitive answer to regarding arrays and searchability.

I'm looking to offload some invoice data to elastic for searching purposes, and I have a couple of fields that I want to represent as a simple array.

One would be a payment confirmation #, and a customer could potentially pay an invoice off in more than one payment. So the my plan for the field would be for it to look like this "confirmation_numbers": [12345, etc]

Is that a searchable field like that or do I need to mark it nested?

Related to that, I then want an array field with all the payment dates, so something like "payment_dates": ["date1", "date2", etc]

A. is that field able to be defined as a date type, and B) is that field searchable using "from" and "to" with that layout.

Elasticsearch doesn't make a distinction between single-value and array fields, so "conf_num": 1234 and "conf_num": [1234, 5678] both have the same mapping/schema and you can have some docs with a single value and some docs with an array in the same index.

The main thing to be aware of is with array-valued fields, order is not considered when searching, so you could query for "conf_num:5678 AND conf_num:1234", but you can't specify in the search that 1234 has to come before 5678 in the array. You'll want to use a range query for from/to queries, but keeping in mind the above, a range query will return all docs that where any of the values in the "payment_dates" array fall into that range.

Just-In-Timeberlake
Aug 18, 2003

Arcsech posted:

Elasticsearch doesn't make a distinction between single-value and array fields, so "conf_num": 1234 and "conf_num": [1234, 5678] both have the same mapping/schema and you can have some docs with a single value and some docs with an array in the same index.

The main thing to be aware of is with array-valued fields, order is not considered when searching, so you could query for "conf_num:5678 AND conf_num:1234", but you can't specify in the search that 1234 has to come before 5678 in the array. You'll want to use a range query for from/to queries, but keeping in mind the above, a range query will return all docs that where any of the values in the "payment_dates" array fall into that range.

thanks, just what I was looking for.

duck monster
Dec 15, 2004

Happy Thread posted:

What's an easy free way to host a 5 gig JSON file for one person at a time to query? It's just the Yelp Academic Dataset for a personal demo for a school project.

I'm looking at some free mongodb hosts like MongoDB and Heroku but their specs just list RAM sizes (all less than a gig at the free tier) and don't seem to come with, like, a hard drive to pull a big file from.

Normalize the gently caress out of that thing. A 5 gig searchable data blob is going to play havok on any architechture. Redesign that thing, ruthlessly.

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
Hey all hoping you can help me. I'm working through setting up my first project with mongodb using mongoose and node. My project is a very simple website that lists triple a baseball players. I'm having trouble figuring out what I need to do to get the following to work. I realize I could just be dumb and did not set up the schemas right. Anyways

Player has a schema of:
Name
Team (objectid ref to TripleATeam or DoubleATe)

DpubleATeam has a schema of:
Team name
MajorLeagueClub (objectref to MajorLeagueTeam)

TripleATeam has a schema of:
Team name
MajorLeagueClub (objectref to MajorLeagueTeam)

MajorLeagueTeam has a schema of:
Team Name


My problem is I can't for the life of me figure out how to query MajorLeagueTeam and get every player for all three teams listed. Is this possible? Did I mess up by not allowing each player to also select the major league team? I feel like there should be a way to programmatically reference the top level team

Thanks!

ulmont
Sep 15, 2010

IF I EVER MISS VOTING IN AN ELECTION (EVEN AMERICAN IDOL) ,OR HAVE UNPAID PARKING TICKETS, PLEASE TAKE AWAY MY FRANCHISE

Empress Brosephine posted:

I can't for the life of me figure out how to query MajorLeagueTeam and get every player for all three teams listed. Is this possible?

I was halfway towards writing out a SELECT statement when I realized what thread I was reading.

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
I mean if you have that I'll take it also, I have some knowledge of postgres :D

ulmont
Sep 15, 2010

IF I EVER MISS VOTING IN AN ELECTION (EVEN AMERICAN IDOL) ,OR HAVE UNPAID PARKING TICKETS, PLEASE TAKE AWAY MY FRANCHISE

Empress Brosephine posted:

I mean if you have that I'll take it also, I have some knowledge of postgres :D

Cloudflare said "gently caress you, that looks like a SQL injection attempt".

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
Rip

ulmont
Sep 15, 2010

IF I EVER MISS VOTING IN AN ELECTION (EVEN AMERICAN IDOL) ,OR HAVE UNPAID PARKING TICKETS, PLEASE TAKE AWAY MY FRANCHISE

Assuming I could just query on Team and ignoring the object ref part of the NOSQL setup, how I’d do it in SQL is here: https://pastebin.com/xrnGBhw1

...which is straightforward, two sub-selects leading to Team IN (all matching AA) or Team IN (all matching AAA).

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
That seems Soo easy. I asked this question on reddit and even they are like "not really possible with nosq". Guess I should conver to postgres. Thanks.

PIZZA.BAT
Nov 12, 2016


:cheers:


This would be pretty straightforward with nosql DBs that have triples. You should be able to get what you're looking for with the $lookup feature though. Don't know how it works under the hood / it's performance implications but it'll let you write the query as a left-outer join

PIZZA.BAT fucked around with this message at 17:42 on Dec 30, 2020

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
I had no idea about $lookup, i'll have to look into it. Here's the code I typed so far also:

code:
Here is the Schema I used, I was actually misremembering; I didn't go as far as Double A team, i just did the major league and the Triple A team:
var MajorLeagueTeam = new Schema({
      team_name: {type:String, required:true, maxlength: 100},
}}
var TripleATeam = new Schema({
   team_name: {type:String, required:true, maxlength: 100},
   majorleague_team: {type:Schema.Types.ObjectID, ref:"MajorLeagueTeam", required:true}
})

var Player = new Schema({
    player_name: {type:String, required:true, maxlength: 100},
    team: {type:Schema.Types.ObjectID, ref:"TripleATeam", required:true}
})

So on the individual page for MajorLeagueTeam i'm trying to get it to list all the Players in TripleATeam and eventually DoubleATeam, although while i'm typing this I think i set the database up wrong; tripleAteam and doubleateam shouldn't be individual objects they should be a type under MajorLeagueTeam that would include the actual MajorLeagueTeam? But then I wouldn't be able to put together a team for just the TripleATeam? 

Anyways heres where i'm getting stuck on the query:

exports.team_detail = function(req,res,next) {
async.parallel({
bigleagues: function(callback) {
MajorLeagueTeam.findById(req.params.id).exec(callback);
},
triplea: function(callback) {
TripleATeam.find({ majorleague_team: req.params.id }).exec(callback);
},
players: function(callback){
// THIS IS WHERE IM STUCK
}
})
}

Thanks for all help!

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
I kinda figured it out but now i'm running onto scope issues! :) Which I guess is a step forward. Here's how I "solved" it:

code:
 players: function (callback) {
        var pljson;
        function plteams() {
          TripleATeam.find(
            { majorleague_team: req.params.id },
            "_id",
            function (err, tripleateams) {
              if (err) return err;
              pljson = JSON.stringify(tripleateams);
              pljson = JSON.parse(pljson);
              console.log("pljson is starting %s", pljson[0]._id);
              return pljson;
            }
          );
          console.log(pljson);
        }

        Players.find({ team: plteams() }).exec(callback); 
my problem now is that plteams() is returning back as undefined...hmmm

Empress Brosephine fucked around with this message at 19:51 on Dec 30, 2020

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
I FIGURED IT OUT YAY!!! :D

Thanks for the help all.

Adbot
ADBOT LOVES YOU

beuges
Jul 4, 2005
fluffy bunny butterfly broomstick
I hope I'm not reviving this thread for nothing...

I'm building something where I have some devices at different sites running node, which need to keep in sync with each other at the same site as well as a cloud backend. The solution we've come up with is to use PouchDB on each device, syncing with each other as well as a CouchDB instance in the cloud. We've got the devices syncing with each other, and using a selector filter on the CouchDB sync, the devices only replicate documents with a matching site id. I will then have a service on the backend keeping a continuous _changes feed open on the CouchDB, so that records from the devices can get processed into the actual backend.
The reason for the devices syncing via a selector filter instead of each site just having its own db is because we expect to have tens of thousands of sites, and I'm expecting that keeping open that many connections open to a _changes feed on each db is going to be not great.

So:
- am I right in choosing to have a single db instead of a db per site, given that I need to be watching for changes in real time?
- is there a better way to do this instead of using the _changes feed? can I configure some sort of callback function in a design document that can post the document data onto a queue or something, so I don't need to watch _changes?
- if the single db is the way to go, then is it possible to configure CouchDB to block attempts to replicate without a selector filter specified? I'm assigning unique credentials to each device, but I want to prevent a rogue device from syncing the entire DB and seeing data from other sites that it should not be able to see

Ideally I'd rather have a unique database per site so credentials prevent a rogue device from being able to pull down data from another site's db, and have Couch drop the incoming documents onto a queue that my service can watch instead of the _changes feed, but a) I'm not sure if that's possible or how to do it, and b) I'm not sure if that would be better than just monitoring a single _changes feed, and c) I'm not sure if monitoring a single _changes feed instead of tens of thousands of _changes feeds will have a performance impact in the first place.

Please guide me, dear goons

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply