Video: Technology Stacks – Computer Science for Business Professionals – by CS50 at Harvard
SPEAKER: So much as you might understand how the internet works, whether it's HTTP that transports data or HTML that is that data, and as much as you might understand some of the fundamentals of programming like loops and conditions and Boolean expressions, variables, and more, it turns out there are so many different ways in which you can implement those ideas. And so, indeed, when it comes time to actually build a website, a web application, a mobile application, it turns out that it's rather non obvious where to begin some times if only because you have so many options ahead of you. And much like the world of clothing and the world of fashion, more generally, is constantly evolving such that what is cool and appropriate to use now or to wear now might not necessarily be appropriate some months or years hence, the same can be said for better or for worse of the technology world in so far as humans are constantly innovating.
Humans are constantly finding fault or opportunities for improvement in languages that we've used for years, in server software that we've used for years, and intuitively improving on it. And so the reality is that staying current with this whole world takes some effort even as the fundamentals largely remain constant. And so what we'll try to do here is give you a sense of some of the languages, some of the frameworks, some of the libraries, some of the overarching design decisions that are both in vogue and both here to stay right now, as well as take the lid off of some of these technologies and give you a better understanding of how some of the fundamental types of technologies from which you can choose actually work. So let's consider for the moment the so-called front end of an application. Front end generally refers to that which is facing the user.
Then if you could build it a little bit better, quote unquote, in quotes and so far as reasonable people can disagree as to what's inferior or superior in this world, then if you were to design it using a completely new framework for which there's just a non-trivial learning curve for you. And so there's, as in the case of data structures, as in the case of algorithms, as in the case of computer science more generally, there's these tradeoffs, and human time, developer time, learning time is certainly one of the resources that you have to decide how much of which you want to spend upfront. Meanwhile, in the world of CSS, there are also libraries there, collections of CSS files and frameworks really, methodologies for which you lay– via which you lay out your site, like Bootstrap, Foundation, Semantic UI, and more, and these focus more so on the aesthetics of a user's experience, more so on the presentation of information and the types of user interface mechanisms, the buttons, the menus, the windows, and the like that a user might see on the screen. But here, too, there are so many different wheels that have been invented in the past. So many different people have decided, you know what? That default link on a web page could look much prettier.
Looking on websites like Hacker News from Y Combinator, where there's an active community of folks from the startup community, especially talking about these kinds of technical decisions and design decisions more generally. Websites like Quora or other Q&A websites. Looking at GitHub.com, a popular web site where people store their code and can actually follow or star other people's repositories of code from which you can infer a sense of popularity based on how many people are following a framework x or y or z. But this is always a moving target, and so it's simply part of the conversation to have from the get go. And you're not necessarily going to regret a decision if you don't necessarily pick the most trendy or the one that's poised to take over all others because this is a fast changing world. And, in fact, one of the most frustrating if not expensive aspects of this world is just how quickly it changes. And so what you design today might not be what you design tomorrow, but that's also part of the excitement of this space. So with that said, that's just a glance at what the front end design process or decision process might be like.
They don't necessarily make building a mobile application as easy as it could be, or even if it is relatively easy, humans have found over time that, gosh, every time I build a mobile application, I'm like copying and pasting dozens or hundreds of lines of code because they all share a common framework or maybe a common meaning system or a common set of functionality. And so in this world to have libraries of reusable code built up and frameworks, libraries of code and methodologies via which you're building your applications, arisen. Among them Django, Flask, Laravel, .NET, Node.js, Rails, and the like, .NET being up there, too, because it generally refers to a set of languages you might use as well as the framework that oversees those various languages. And there's even more options ahead of you here. So how do you begin to pick among these options as well? Well, here, too, you're often guided by what your engineering team knows, perhaps what your own system administrators or your operational people know, so the folks who were actually maintaining the servers, whether they're locally on site, maybe they are the ones running things in the cloud, whether that's Amazon's or Google's or Microsoft's cloud or some other company still, depending on what that cloud infrastructure supports, might influence your decision making as to what language you might use.
And some of these languages are newer than others, and so they might come with more features so you don't have to rely as much on third-party libraries. Some of them less vetted or maybe less robust or even less secure than what comes with the language itself. And recall, too, that these languages are often constantly evolving, some more quickly than others, but there are new versions of these languages coming out. And so even within the confines of a given language like Java might there be new and improved features every year, every couple of years. And so sometimes actually picking a version of these languages or frameworks is one of the design decisions to bear. And I would say from a non-technical perspective, most compelling is just to be aware of these kinds of technologies, these kinds of buzzwords du jour, but also aware of these kinds of tradeoffs. It's not necessary to get into the weeds I think of understanding each and every language and what it's good for.
Although once you have a general understanding of this world, of programming, of server side architecture, of HTTP, and web pages and the like, can you via Google and other websites I think start to wrap your mind around some of the tradeoffs and perhaps even start to tease apart which are technically compelling arguments that you might see online versus just religious objections to this language or that because that's often the case when folks get into heated discussions of language choices for instance. But I think ultimately understanding the tradeoffs, the onboarding time or the learning curve for various languages, the appropriateness of language for certain specific use cases like the real-time chat applications or whatever your own product happens to be, what your engineers already know what they're good at, what they prefer to use what language and framework is easiest for new hires, maybe six months hence or two years and to actually come on board and understand so that you're not expecting the most experienced of new hires to constantly be in your pipeline. So appreciating these kinds of tradeoffs and asking these kinds of questions even among the engineers that are perhaps making the decision ultimately is a valuable way to contribute to the conversation and take some comfort in the fact that your product need not be a complete black box.
You might not necessarily be able to implement it from scratch yourself, but you can at least ask the right questions and be a sounding board for some of the answers that come back. Now, there are some fundamental differences and some of these architectural decisions among which are around choice of database. Indeed, most any web application today has a back end database inside of which is stored data from users, whether its purchase orders or user registrations and passwords and any amount of data that's collected from users at the end of the day is stored somewhere and that somewhere is called a database. But there's different types of databases. Two of the biggest categories these days or perhaps SQL and the opposite NoSQL, as it's playfully called, SQL being structured query language and NoSQL referring to a class of databases that doesn't support SQL and indeed is not generally relational. And we'll soon see what that means. But even with this world, do you have a veritable menu of options MariaDB mySQL, Oracle, PostgreSQL Server. And then within those– within that relational world do you also have the cont– in addition to that relational world, do you have the contrast of the object-oriented or document store world, things like Bigtable, Cassandra, HBase, MongoDB, and others.
And already it can be sort of overwhelming to feel like just as you're getting up to speed on what the web is and how web pages work, oh my god! We're just beginning to make our decisions. But generally these decisions, too, can be guided by what your team knows, what you're comfortable with, what the price might be for some of these. And some of these are free and open source. Some of them have commercial licenses associated with them. Some of them are supported easily for you with your cloud provider, wherever you're hosting your servers or you might have to host them yourself. So you can begin to narrow the field of options. And indeed, especially when building multiple products, might you build on past experience of yourself. So, for instance, for the course and all of our web-based applications, we tend to use a lot of the same technologies and only recently have we begun to transition from one main language to another but doing it pretty much for all of our applications across the board so that we don't have to worry about some of the team members knowing x and y and z.
It's just there's an economy to scale to focusing on relatively fewer technologies internally. But let's dive in a little deeper into SQL and NoSQL if only because they're so cleanly bucketized into SQL and not-SQL really. What do we mean by this, and what does a database really do? So a database typically supports these– at least these four operations or categories of operations playfully called CRUD, which stands for create, read, update, and delete, though you might see some variations on what the actual words are. But CRUD refers to those four fundamental operations. Now, in the world of SQL, or S-Q-L, structured query language, which itself is a programming language, and it's a programming language you use to query a database, to add data to a database, remove it, edit it, and the like. Within the world of SQL, there are– is a direct mapping of these four operations, the four keywords, for features of the SQL language, namely create, select, update, and delete.
So it's almost CRUD, but it doesn't quite line up perfectly. So create, read, update, delete is the general notion of the four operations database might support, and in the world of SQL, which we're about to dive into, might you see these four commands specifically create, select, update, and delete. So what does it mean to be a SQL database, or more generally, what does it mean to be a relational database? Because a relational database is often historically what people think of when they think of databases and only in recent years has this NoSQL trend been catching up that changes the paradigm. And we'll look at the flip side in just a moment. So if you've ever seen this, whether it's this version of Excel or some equivalent version of Numbers or Google Spreadsheets or the like, this is kind of a relational database. It is a piece of software that allows you to lay out your data in rows and columns. And among those rows and columns are there typically relationships. Consider after all the last time you used a spreadsheet, if ever, odds are there was some kind of meaning if you put data in column A versus B versus C versus D.
In other words, when adding data to a spreadsheet, typically if you're using it correctly, you don't just start plopping your data in any random box that doesn't yet have a number or a word in it. You generally organize the data such that in column A might be one type of data, column B might be another type of data, and so forth. And so it's relational in the sense that the numbers and the data within relate to one another. And it's also relational in the sense that you can have multiple sheets even within a file. So by default with Excel and Numbers and Google Sheets do you get just one sheet or worksheet by default. But if you ever want to have multiple types of data but all in the same file just because it's kind of nice and orderly to keep it all together, you can click the plus and create a new sheet and have a completely different tabular structure, a different number of rows and columns and different meanings for those columns but somehow the data is all related, ergo this notion of a relational database. So a relational database stores data in tables, and a table is in turn a set of rows and columns. So why does this actually matter? Well, Excel is not all that powerful when it comes large datasets.
In, fact it wasn't all that long ago that Excel I believe only supported as many as 65,536 or 35 rows probably, and that's because, long story short, they used the 16-bit integer, the biggest number for which is 65,535, and so Excel physically couldn't count as high as 65,536 or 7 or 8 because they just didn't use enough storage underneath the hood. But even if you had that much data, and that's quite a lot of rows, the software just tended to be super slow at least in my own experience back in the day trying to manipulate very large datasets. And Excel just wasn't designed for tens of thousands of rows. By the time you're at that much data, you should really be graduating, so to speak, to an actual relational database management system, a server-driven database that actually leverages not just files but memory more effectively. In fact, what– among the features you get from products like MariaDB and MySQL and Oracle and Postgres and the like is you get really smart people who have implemented the software in such a way that it makes your creates and your reads and your updates and your deletes faster than they might be if you were just storing all of your data in a big file.
For instance, in a big file like Excel, if you want to search for some information, you can hit Command F or Control F or whatever, type in a keyword, and then Excel or Numbers or Sheets will search for it. But generally these spreadsheet programs are going to search for your data pretty much by a brute force, search top to bottom, left to right, looking in every darn cell for that data. That's fine if you've got a pretty small spreadsheet. We slow humans aren't going to notice the difference. But in the context of really big datasets, tens of thousands of rows, let alone millions or billions, it does not suffice to look at every piece of data when looking for a certain phrase or a certain number or some such value. You want the database itself to do some anticipatory optimization, sort of working its magic underneath the hood using various algorithms and data structures, so as to optimize those queries and to give me answers in logarithmic time, not linear time, or time that's faster than searching the whole darn thing. So at some point, spreadsheet software does not cut it, and you transition to a more proper relational database.
But moreover at that point, you have to start deciding how you want the database to store your data. Because at the end of the day, we humans generally know a little more about the programs we write about the data we're going to be storing. And by this I mean, if I am storing a bunch of data in a database, I probably know better than the computer might know which of these values is always going to be like an integer or which of these is always going to be a dollar amount or which phrase is always going to look like a time of day or a date or day of the week or some other such value. And so we humans can actually help databases help us be more highly performing by providing them with hints, otherwise known as data types, that tell the database what type of data to store and therefore how to store it most efficiently. Some of those popular data types in the world of SQL then are these, and let's just take a look at a few of these. So char and varchar. So char being shorthand for character, and it's not a single character like a or b or c.
Character, or char, generally refers to a column in a database that is going to store one or more characters a little confusingly, a string, so to speak, where a string is a sequence of 0 or more characters. So when designing a database column that you know is going to contain a word or a sentence or even a paragraph, you can tell the database, hey, database, make this column this many characters wide, i.e. allocate that much data upfront. But if you're not sure, as might often be the case– maybe someone has a short name. Maybe someone has a long name. Maybe someone has a long address or a short address. If you don't really know what the right length is for a column for the values a user is going to provide, you can instead use varchar for variable length character strings, which is to say you specify only an upper bound.
So I don't know what the longest name is in the whole world. But my name is D-a-v-i-d, five feels like it's kind of short. Probably some people with longer names in the world. 20, is that enough? I don't know? 50? I don't know, 100? Probably. I should probably Google to find out with a bit more reassurance, but this is a decision that the web designer or the database designer is going to have to make. You can't just tell, and you don't want to just tell, the database accept any length string because the more flexible you expect the database to be, the more generous you expect the database to be, the less optimization it can do for you. By contrast, the more precise you can be, the more conservative you can be, the more optimization algorithmically the database can do so that when you ask for data back, it can give you those answers faster. When you insert data, it can insert it faster.
So the more helpful we humans can be with our databases, the more help the database can be in turn, and that's probably a good thing when we have lots and lots of data and users because we want the common case to be highly performing. It might cost me a minute, five minutes upfront to really noodle on the problem and figure out what the best design is. But that cost is going to be amortized over thousands of users, millions of users, who are then benefiting thereafter from a better database design. So where is the line to be drawn? We'll explore this in the context of an example, but it kind of depends. There is no right answer necessarily. It really depends on your use case and the data you're trying to store. With numbers, too, do you have some discretion.
Integer means what it is, generally a 32-bit value, which means you can have a number from negative two billion to positive two billion, give or take. But that might be overkill. If you know you're dealing with really small integers, maybe you don't need 32 bits. Maybe you want fewer and so you might just say, small int. Doesn't need to be that many bits. I know my values aren't going to get that large. I might as well save the database space. Or by contrast, wait a minute. Going to have more than two billion users, success permitting. I'm going to, therefore, want to use a big int like 64 bits, so I can have many, many, many, many users or rows in my database. And indeed some of the most popular websites out there have run into this issue. The Facebooks, YouTubes, and the others of the world, well, they just have so much darn data, they had better not cap the number of rows in their database table at only two billion because they might well have that many and more.
Then why not just choose varchar with a really big number? Why not choose big int with a really big number of bits? Well, it's wasteful. You shouldn't over-allocate because then you're just spending more space, and space costs money and might even cost time to search if there's more bits to be looked at. And so you don't necessarily want to just cop out and say, use as much space as you want or as is necessary because, again, we can't be as helpful therefore to the database. Now, Numbers are an interesting one, and this is true in programming languages whether it's SQL or C or C++ or yet others. It turns out that choosing how big your data is, or anticipating it, has some real impact in some cases of numbers. So it turns out that a integer of course is just a number like negative 101 and on up in both directions. But a floating point value or float is a real number, a number that's not necessarily an integer, but a real number that has a decimal point and some number of digits after that decimal point that may or may not be representable precisely as a fraction.
So that's a real number or a float. If you want more bits or precision than that, you can actually specify double precision, which gives you more bits and therefore you can have even more digits after the decimal point. But the key takeaway here is that at the end of the day, it's going to be finite if you're representing a number. And so if you do use something like a float, even a double precision float, which gives you more bits of precision. At the end of the day, last I recall from grade school, there is an infinite number of numbers in the world, both integers and real numbers for that matter. So in both the case of these integer base numbers and these floating point values, you can only count so high, or you can only specify a number so precisely. And at the end of the day, you might have some overflow where you just can't represent bigger numbers, whether positive or negative, or you just can't represent enough decimal points– enough numbers after the decimal point to represent a number perfectly accurately.
And so there's this tradeoff. You might want more and more space, but at some point, you can have an infinite amount of space. Computers are physical devices. They only have a physical amount of memory inside of them. You might have to draw a line. And so if you've ever seen some older movies like Superman 3, which has a great incarnation of this or somewhat more recently, Office Space, where there's money making scam whereby the companies in question, long story short, were constantly manipulating monetary amounts in their database systems, but they were always rounding off fractions of pennies. And so the masterminds in both movies started pocketing all of those fractions of pennies, but hilariousness ensues when they don't quite realize how much those fractions of pennies add up. But that too is an issue of imprecision.
We in the human world generally, when going to stores and such, use only two decimal points of precision. But investment banks and banks more generally might actually use more decimal point– more numbers after the decimal point than that. And so having the ability of expressing numbers more precisely is compelling. Thankfully, there does exist Decimal, which allows you to specify how many numbers maximally you essentially want before and after the decimal point or the total number in question. And so that would be an alternative to these others. But it might end up then costing you more space just to get that more precision. So here, too, as with the decisions around frameworks and libraries and languages, here, too, there's a tradeoff. Even at this lower level, when you really get into it, deciding how to store your data in a database. Lastly, and a little more easily, there are data types like to Date and Time and Timestamp, which do as they say. They look like dates.
They look like times. They look like timestamps, just some counter from some preordained moment in time. And these data types are commonly used as you might guess to store these types of data in a database. When did the user last log in? When did he or she register for the website? When did he or she buy something from our catalog or the like? You can represent those and more data types in a standard relational database that supports SQL. But you have some other options, too. It turns out that in a relational database, you can be even more helpful to the database by telling it in advance if any of your columns should be considered a primary key or a foreign key or a unique constraint. Now, what does this mean? Well, typically with data, it is useful to be able to uniquely identify a row in your table in your spreadsheets without having to look at the whole row. For instance, when using Excel or Numbers or Google Sheets, you'll notice that by default, all of the rows are just numbered 1 through whatever.
That's useful because if you are collaborating with someone or you yourself are just trying to find some value, you could just jump ahead to like row 50 to identify the 50th row of your data. You don't have to look for a specific name or address or purchase order or whatever it is that you're storing in this table. You can just jump to the row number in question. A relational database very often takes the same approach, using some piece of data, usually just an integer 1, 2, 3, 4, just like the spreadsheet programs, to uniquely identify the rows so that you can access them very quickly via that number or that index. A foreign key, we'll see, is a notion of a piece of data that exists in two separate tables– two sheets where there's an interrelationship but more on that kind of example in a moment. And a unique key, a unique column, is one where you should not see any duplicates. So, for instance, maybe when building a website that has users register for your website, if you want to ensure that no user can have the same email address as another, you can specify to your database, hey, database, make sure that Mayland@Harvard.
edu, or whatever the user's email address is, only appears once in a column in my database. Don't let David or not-David register with that same email address. And so this is a useful way to ensure that you have correct behavior of your website and you have integrity of your data so that you don't accidentally have duplicate values, which would lead potentially to ambiguity. And there's even more features you might get from a typical database. So let's indeed now try an example whereby we decide how best to store data in my database. But to simulate my database I'm going to quite simply just use Excel here. I could use Apple Numbers or Google Sheets or the like or any spreadsheet program, but at the end of the day I'm really just using this because it's a program with rows and columns. In reality, if I am a business owner and I have a web-based store and I sell widgets and sprockets on my store, the reality is I want to keep track of who has bought what so I know what my revenue is, so I know to whom I need to ship things, and so forth.
So here is some information therefore that I might store. This is good because I know to whom to ship it. I know how many widgets I have sold. And maybe the price should be in there as well. So she paid maybe $9.99 for this widget. All right, now, let's fast forward in time, and let's assume that someone else has visited my website, and they too have decided to buy a widget but multiple widgets. We upsold them. So a widget was bought, quantity 2. This is, say, Rob Bouden also in 33 Oxford Street, Cambridge, Mass, 02138, USA, and this one was a total of $19.98 since he bought two of them. So I've just been storing this data in sort of freeform format but each of these columns clearly has meaning. So maybe this first column should really be called Product.
This one should be called Quantity. This one could be called Name. This is maybe Street. This is maybe City. This is maybe State. This is Zip. This is maybe Country. And this is maybe total. But even here there are some opportunities for disagreement. This is a little US centric, the fact that we have cities and states, as well as zip codes. Indeed it might be the case that zip codes don't all follow the same format indeed even in the US sometimes people write them with five digits, sometimes with nine digits and a hyphen, so there's a design opportunity there. But let's drill in deeper as to what data type these various fields should be at least right now. Let me make room at the top here so we can just make notes as to the data types but in reality these would be stored not in the table itself but somewhere else in the database.
What data type should product be? And remember that among our options are data types like these product. It's not a number. So we can knock off most of these options. It's definitely not a date, time, or timestamp, so then it boils down to is it a char or a varchar? So char is a fixed-length field, so we have to decide in advance is it going to be eight characters, 16 characters, 100 characters or something else. Varchar would mean we just know the upper bound. So I don't know. W-i-d-g-e-t is 6, so minimally it's got to be six characters but then there's sprocket, s-p-r-o-c-k-e-t. Not sure if I've ever had to spell that word. That's eight characters so six isn't going to cut it. So maybe we do something like char8. But there's a tradeoff here. If I specify now that this field is maximally going to be eight characters, then I can't sell anything with a longer name than sprocket.
I could change the database size– or the column size later on but it's ideal to get these things right from the get go and not have to go back in and change your infrastructure or hire someone to come in and make modifications. So maybe that's a little shortsighted. Maybe it shouldn't be eight. Maybe it should be twice that, like 16. I don't know, and I don't– I'm not necessarily going to offer an answer here because it entirely depends on what data you're trying to store, what items you're trying to sell. This, in fact, now is even more wasteful because even though I'm now anticipating product names that are up to 16 characters, the char data type is going to use for every product name 16 characters, even if a whole bunch of those are blank because the word isn't long enough to need 16 characters. So maybe I would go for a variable length field not char but varchar whereby the maximum length of my column should be 16 characters.
But here as in CS more generally tradeoff like it might seem like a win like OK wolf the problem is I'm using too much space all the time, Let me just put an upper bound. There's gotta be some price you pay there's got to be a tradeoff and indeed there is. It turns out that a database can generally search your data more quickly if it knows the entire column is the same width. Long story short, if it knows that this column has eight characters, eight characters, eight characters, eight characters, it can use very simple arithmetic to jump mathematically from one row in that column to another because they're all the same distance apart essentially. But if you have a varchar column and variable length, you can think of the column not as being perfectly smooth on both sides but kind of jagged on one side. Some rows are short. Some rows are long. And so you can't just blindly use simple arithmetic and jump eight characters at a time if the length were eight or 16 characters at a time if the length were 16.
So it's a tradeoff. If we want to be able to search through our product names quickly, might not want to use a varchar. So here, too, no right answer. It's a tradeoff. And it might not matter for small datasets. Indeed probably doesn't. If you don't have many customers, you don't have many products but certainly and scale these kinds of things matter. And even when building something that's not going to have that many users, just getting it right doesn't cost that much upfront time. The most important thing, I dare say, is to actually give it some thought and not just leave it to chance or take the easiest way out because invariably over time, you will build up so-called technical debt, where you make poor decision, poor decision, poor decision, and now you have a very expensive decision later on if you have to go back and change a lot of those things.
What about quantity? Well, quantity, nicely enough, would seem to fall more cleanly into one of these fields. Now, is that an integer? Is that a big integer? I think we're doing pretty well if we need more than 2 billion products sold. Maybe we're sort of a smaller shop, and we can get away with an integer or even a small int, but this too would be a tradeoff. How many bits do you want to spend? I would say that the default typically would be an integer unless you really are expecting a huge amount of data, billions of rows. So we might say something like integer here, name. Oh, gosh, this is that can of worms again. How long is a maximum name? Maybe I do some googling and some due diligence as the maximum length names. Maybe I just want to cut it off.
You've probably been to a website before where you're happily entering your information, and then you keep typing and nothing is happening. And that's because the programmer, or the database designer, has decided your name doesn't need to be that long. Or your address doesn't need to be that long. And it's infuriating sometimes because there are assumptions, naive, insensitive assumptions sometimes, but that boiled down to perhaps either calculated design decisions or maybe just poor design decisions. So I don't know what is right here. 16 feels a little too conservative, so maybe I would say something like varchar 128. But even that I'd probably want to take a look at my customer base and see if that's well beyond the limit of what I might actually need. Same thing for street. Same thing for city. I don't really know what the right length is, but let's assume it's going to be varchars for those.
So we'll just use a dot, dot, dot to suggest that it's an open question. State is an interesting one. If we expect to have only US customers, we can do a little optimization here. If every US state has a two character abbreviation, we could do char2 so that we get that performance benefit of knowing that every row is the same width, two characters, so long as we're comfortable not selling products to anyone else in the world beyond the United States zip code 2. Design opportunity there. I think it's fair to say that integer, while seemingly correct, might get you into some trouble, at least here in Massachusetts, where we have a whole bunch of zip codes that start with zero. Like in the world of numbers and integers, leading zeros are meaningless. You can have as many zeros to the left of your number and they don't change the actual value of your number.
But in a zip code, it does have meaning. It is the first of five digits here, and so calling this an integer probably isn't very wise because if the database is like most humans. The database might ignore that first digit, and so my zip code is going to appear to be 2138, which really isn't right. Now, we could fix that in code. We can make sure that, well, if we ever see a zip code that's only four or fewer digits, this let's pre-pin some zeros, that feels messy. If we're going to put that data in there from the get go, let's make sure it comes back to us correctly. And so I might actually say something here. Even though it looks like a number, maybe I would actually say it's a char5 field, or maybe it's nine or 10 if I want to have a hyphen in there for US zip codes. Country, too. Here maybe it's– going to be a three-character abbreviation of two-character abbreviation, not sure what's best there.
Really depends, too, on what countries want to sell to if not just the US perhaps, so there's a design opportunity there. And then perhaps the last to consider is this total. I think it's fair to say that integer would not be correct because we would either be rounding down or rounding up all of the money we're supposed to be collecting from our customers. So we probably want one of these. And some databases differ, but generally a data type like Decimal is ideal. You don't want to even get into the business of worrying about these rounding errors or errors of imprecision as in Superman 3 and Office Space. Much better to just say that you want a fixed number of digits to the left and a fixed number of digits to the right of the decimal place so that you are not losing even fractions of pennies or mischarging anyone or losing out in any way. So we might use Decimal in that way.
Some databases, though, have an actual currency data type, which operates similarly. So there remains to be seen some opportunities for improvement. If I continue to sell widgets, let alone sprockets, I'm going to have more and more and more rows in this table. And if Rob and Zamyla end up being repeat customers, I might have more and more Robs and more and more Zamylas in the same table. And as that happens, there begins to be quite a bit of redundancy. Indeed, what can you factor out over time? Well, certainly if Zamyla and Rob keep ordering more and more items from my database, I could just keep updating the quantity, but that feels a little messy. It'd be nice to have a veritable history of all of my sales. I don't want to just aggregate everything. So adding more and more rows for every sale seems pretty compelling, but then I'm going to see Zamyla Chan and Rob Bouden again and again and again and again and again in this table.
And I'm also going to see their address again and again and again and again. And herein lies now the capabilities and of the feature of a relational database. You know what I'm going to do rather than just treat this as my one and only table, let me go ahead and just rename this sheet or worksheet to be Orders. I could call it anything I want. And you know what? Let me create another table or sheet, and let me call this Customers. So even though, again, I'm using Excel here, this is just like I might be doing in Oracle or a SQL Server or in Postgres or mySQL or the like, I've just created a second table. But as per the name relational database, there's going to be a relation across these two tables now. And what's that relation going to be? Well, you know what? I'm going to go ahead and copy all of this customer data and actually cut it and paste it over into this new table called Customers. And now this isn't quite sufficient.
I'm going to go ahead and notice that Excel has already numbered these things for me. But I'm going to go ahead just for clarity and add my own column, and I'm going to call this ID. And it's going to be, say, an integer. And I'm going to cause Zamyla my first customer, Rob my second customer, and in this case, notice now these unique identifiers are part of my data. It's not just part of Excel's arbitrary numbering on the left and arbitrary lettering on the top. Rather these are now actual pieces of data in my database that will be stored and backed up and so forth. But notice now that Zamyla is customer number 1 and Rob is customer number 2, each of whom lives at these addresses, I don't have to worry now about redundantly storing that data because in my orders table now, any time Rob or Zamyla or some other customer purchase from my website– notice I can shrink this. And I can say, you know what? This is the customer who bought this. It's going to be an integer. And you know who bought that first widget? Well, it was Zamyla.
And you know who bought that second widget and the third widget, too, since quantity was 2 was Rob. And if some new customer comes into my database– so, for instance, suppose that someone new orders from my website, they are going to become customer number 3. That will be, for instance, Doug Lloyd, and suppose he, too, is at that same address at that same zip code in the USA. But now in my orders table, suppose that Doug has bought 10 widgets. He really went all in. Well, he, too, is going to have widget there, quantity 10, his customer ID is 3, and he, of course, is going to have spent $99.90 with us in total. So notice how we've factored out the common information to eliminate a redundancy. Notice now that if Doug or Rob or Zamyla move addresses or change their address, or if we were storing more information, like their phone number and email address and other personal data, too, we could change it in just one place. And not in our orders table because, indeed, there's now distinct semantics. Our orders table stores orders. Our customers table stores customers.
And if we wanted yet another table, as we probably should have, it could store, say, products. In fact, there's still this redundancy. Let's go ahead and create another table called Products inside of which is an ID field as well as a product field, and then, just as before, let's start numbering our IDs from 1. So our first product is a widget and while we've not sold any yet, our second product, ID 2, is a sprocket. Now, in this way in my orders table, can I store not a product per se, but a product ID. And so now even though my table is frankly becoming more and more cryptic and a little harder for me to wrap my mind around– what am I looking at, it's all just numbers, it is now what we would call normalized in the context of a database. And a database typically is not meant to be looked at by human eyes just like this. Rather it's meant to be queried and data created and updated and deleted. And so there are certain commands in this language called SQL that actually facilitate programmatically, using a programming language, what I've been doing with my keyboard and fingers alone.
Indeed, the commands with which you can manipulate the data in the database itself is going to be SQL's commands, create, select, update, delete, and others. Indeed, much like Scratch has the various puzzle pieces via which you can implement logic in a Scratch-based program, so does SQL will have these puzzle pieces, if you will, via which you can create and select and update and delete data from your database just like I've been simulating by using Excel here and my keyboard. And indeed, some of these more sophisticated concepts, like primary key and foreign key and unique key, now rather start to jump out at us because if we consider what my orders table now looks like, notice that it's indeed mostly numbers, but those numbers are essentially keys into another table. In fact, if you look at products, my products table has an ID column, which has unique numbers 1 2, and so forth my customers table has its own ID column. And these are same numbers, but different meaning. These are customer numbers 1, 2, 3, and so forth.
So in each of these tables customers and in products is that ID column a primary key for the customers and products table respectively. Within each of those tables, it is that ID column that uniquely identifies rows. Zamyla is and shall always be customer number one. Rob is and shall always be customer number 2. Doug is and shall always be customer number 3. So those IDs those primary keys must not change. They must be invariant, and as such they can be reliably used to uniquely identify customers or, in the context of products, uniquely identify a product or, in the case of orders– we forgot something. It would seem valuable if we continue this train of thought to also have here in my orders table an order ID that should probably represent each of these orders, which is just going to similarly be an integer that just keeps track really of how many total orders have been placed, 1, 2, 3, 4, 5, 6 on up, all the way up to 2 billion or best yet even higher than that. But notice these other numbers now.
The product column is no longer the name widget or sprocket. The quantity column, still just an integer. That's not anything to do with a key even though it's also an integer but customer is an ID. But it's not a primary key, nor is product a primary key here. In this context of my orders table is product and is customer a foreign key because those two columns are primary keys in two other tables. So within one table if you have an ID, it should be generally considered your primary key if that is the role it's playing, uniquely identifying your rows. But if that same number appears in some other table for the purpose of cross-referencing really, is it a foreign key? And suffice it to say that in SQL, this database language, even though this looks cryptic to us humans, realize that with SQL can you stitch these distinct tables or sheets back together.
You can quote unquote join SQL tables in such a way that you can take your customers table and your orders table and reassemble them so that you see next to each order, say, on your administrative web page that allows you to see all of your recent orders, not the customer IDs of who has bought what but actually the customer names and their addresses and maybe their phone numbers and e-mails and more. You can join this information back together again. And what databases are good at is doing exactly that kind of joining, not to mention searching or more. But sheesh, this was a lot of work just to get to this point right? It was pretty easy to make one worksheet just put all of my orders in there. But then we went down this slope of oh, well, maybe we should factor this out. Oh, wait. We can factor this out. Oh, maybe we should add some IDs here. We just created a whole lot of work for ourselves.
The quotes are just used around words and numbers, and the colon separate keys from values where keys and values is a very common paradigm where on the left is metadata and on the right is data typically, key and value respectively. Square brackets just mean an array, which means that this is an array or a list of two values, something comma something, which happens to be GPS coordinates, latitude and longitude here. So what is this? What are we looking at? This appears to be an object, shall we say, that represents the city of Austin where Harvard's business school is, where Harvard's engineering school will soon be. And so this object contains a bit of hierarchical information, not a huge amount, but notice it has an ID, which happens to be its zip code 02134.
It has a city name, Austin. Has a location which by convention is a comma-separated list of two values, latitude and longitude, and so that's kind of some hierarchy. It's not just a simple value. And then there's a population of 23,775 at last count though surely to rise. And then in the state of Massachusetts. So this is actually a snippet from a database called MongoDB, which is a very popular noSQL database that stores data essentially like this. So rather than flatten all of your data as is the case in a relational database using SQL an object-oriented database or a document store like this noSQL database called MongoDB, really stores things as key value pairs. And those key value pairs might actually have some hierarchical structure. So if you, for instance, stored an order like we just did, instead of storing it in rows and columns, you would just store it is one big chunk of information like this. And inside of that object, an order object might actually be the entire customer. Inside of that customer might be his or her city and state and so forth. So there might actually be retained in some hierarchy like you see here.
And so this is just a different way of viewing the world. It has typically been a more efficient way of viewing and modeling your world because you don't have to give frankly as much thought to the design and the division of some of your data and the normalization thereof, but you do sometimes pay a performance penalty. You do sometimes pay a penalty and redundancy of data, though there are ways to avoid that by reusing something like that ID field. So it really is ultimately a different philosophy right now. And its a competing alternative to something like a relational database, and here, too, will there be an opportunity to read up on and to debate exactly what is best for your actual problem at hand. And now mobile. Up until now we focused on the front end, on the back end of really web-based applications that you might access on a laptop or desktop or even a mobile device. But what we haven't given thought to is the design opportunities for mobile devices specifically.
Indeed, most any of you who have a smartphone these days, iPhone, Android, or the like, have probably downloaded some application that did not come with your phone. And you downloaded that from the Google Play Store or the Apple App Store and that software is quite likely written in a very specific language. Indeed, the language for Android is typically Java in which programs are written. The languages in which iPhone and iPad applications are written is Objective-C or more recently Swift, and so there, too, at least in the world of iPhones and iPads, do you have design discretion over what language you use with Swift being the more modern and the one that Apple's really been pushing. But even then, do you have the option to not implement a native application per se, one that is implemented in Java or in Objective-C or Swift, all of which are programming languages, you can actually implement a web-based application but package it up in a way that makes it seem like it's a native application, allows you to distribute it via the App Store, via the Google Play Store, so that it ends up putting an icon on your customers phones.
So that requires some learning curve or some time or some talent or money again. And so there too were there are some tradeoffs. And so it really depends ultimately on what is your application and who you have working with you and what is most important and how much time do you have and what do you view the technological horizon looking like some months ahead? So at the end of the day, these technology stacks, as they're called, are really just menus of options. And those menus are constantly evolving, and they focus on the front and on the back and on the server, on the client, on mobile devices, laptops, and desktops. There are solutions to any number of problems. Indeed, the process of software engineering and developing a product and developing a web app or a native application itself is first doing some due diligence and bringing yourself up to speed on what the design possibilities are, having a discussion, having a debate even, with the engineers with whom you'll be working.
And ultimately making the most informed decision that you can with an eye toward what is trending now, what has been trending, and where the industry might be going but ultimately focusing on solving optimally your own problems and choosing among these various and ever-changing technology stacks. .