I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).

With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.

Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.

  • valtia@lemmy.world
    link
    fedilink
    arrow-up
    28
    ·
    edit-2
    7 days ago

    There can be duplicate SSNs due to name changes of an individual, that’s the easiest answer. In general, it’s common to just add a new record in cases where a person’s information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That’s how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.

    Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it’s brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.

    • DarthKaren@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      7 days ago

      JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.

      • GoodEye8@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 days ago

        Hypothetically you could have a separate “previous names” table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it’s hard to say how wrong Musk is. But it’s obvious he doesn’t know what he’s talking about because we know that due to human error SSN-s are not unique and you can’t enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn’t really understand why things are the way they are.

    • DreamlandLividity@lemmy.world
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      edit-2
      7 days ago

      Another accusation Elon made was that payments are going to people missing SSNs.

      A much simpler answer is that not all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don’t get an SSN.

      • lovely_reader@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        7 days ago

        It’s true that some Americans don’t have Social Security numbers, but those Americans can’t collect Social Security benefits unless/until they get one.

  • KillingTimeItself@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    2
    ·
    edit-2
    7 days ago

    TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.

    You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)

    now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.

    The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

    Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,

    • valtia@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      6 days ago

      i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another

      Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

      what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

      Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

      • KillingTimeItself@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 days ago

        Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

        in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.

        Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

        and naturally, he doesn’t know what the term “de-duplication” means. Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

        • valtia@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          6 days ago

          in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.

          … That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

          Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

          … I don’t think you understand how modern databases are designed

          • KillingTimeItself@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            6 days ago

            … That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

            u were talking about not keeping historical data, which is one of the proposed reasons you would have “duplicate” entries, i was just clarifying that.

            … I don’t think you understand how modern databases are designed

            it’s my understanding that when it comes to storing data that it shouldn’t be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that’s irrelevant to the discussion of de-duplication aside from data consolidation. Which i don’t imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.

            Also, we’re talking about the DB used for the social security database, not fucking tigerbeetle.

      • DacoTaco@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        6 days ago

        Ssn being unique isnt a dumb idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.

        Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

        • KillingTimeItself@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          6 days ago

          Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

          even then, i wonder if there’s some sort of “row hash function” that takes a hash of all the data in a single entry, and generates a universally unique hash of that entry, as a form of “global id”

  • h4x0r@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    2
    ·
    edit-2
    8 days ago

    I think a lot of comments here miss the mark, it’s not really just about stating the gov does not use SQL or speculation regarding keys.

    Deduplication is generally part of a compression strategy and has nothing to do with SQL. If we’re being generous he may have been talking about normalization, but no one I have ever met has confused the two terms (they are distinctly different from an engineering perspective).

    There are degrees of normalization too, so it may make total sense to normalize 3NF (third normal form) rather than say 6NF depending on the data.

    • whotookkarl@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      8 days ago

      This is it, relational databases are normalized under forms, deduplicate is usually a term used when talking about a concrete data set from data sources like a database, not the relational data model in the database itself.

  • owenfromcanada@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    8 days ago

    TIL Elon doesn’t know SQL or have any basic human decency.

    J/K, I already knew he doesn’t have basic human decency.

    If he knew anything about SQL, he could have run a quick search to see whether any SSNs are actually duplicated. (spoiler alert: they’re not, he’s just stupid).

  • CodeHead@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    2
    ·
    7 days ago

    The US government pays lots of money to Oracle to use their database. And it’s not for BerkleyDB either. (Poor sleepy cat). Oracle provides them support for their relational databases… and those databases use… SQL.

    Now if Musk tries to end the Oracle contracts, then Oracle’s lawyers will go after his lawyers and I’m a gonna get me some popcorn. (But we all know that won’t happen in any timeline… Elon gotta keep Larry happy.)

    • SloppyPuppy@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      7 days ago

      He gonna write everything in Pandas. Who the fuck needs to pay hundreds of millions a year to Oracle. (And I bet thats really how much they pay Oracle)

      Also, ohh boy Oracle’s layers… those you dont wanna mess with.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      2
      ·
      7 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • KillingTimeItself@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 days ago

        Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database

        formally, changing the identity of someone would have a very explicit reason to keep a “duplicate” ssn entry, if purely for historical reasons for example. I’m sure there are a myriad of technical reasons to be doing this.

  • Kalkarino@lemmy.world
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    7 days ago

    It doesn’t matter anymore to the trumpers. They are eating this shit up like it’s thanksgiving

    • r00ty@kbin.life
      link
      fedilink
      arrow-up
      6
      ·
      8 days ago

      It’s a terminology thing really yes. I mean a database (SQL or not) shouldn’t need de-duplication by nature of how the record index/keys work.

      If they’re not using a form of SQL though, I’d be very interested in what they are using. Back in the 90s I was messing around with things like Btrieve and other even more antiquated database engines. But all the software I used that utilised such things was converted to use a form of SQL (even if in some cases there were internal wrappers to allow access in the older way too via legacy code) over 20 years ago.

      If I were an American though my biggest concern would be that Musk is able to know the structure AND content of the social security database. His post (if we believe it) demonstrates he must have access to both pieces of information.

      • snooggums@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        8 days ago

        His post (if we believe it) demonstrates he must have access to both pieces of information.

        At best he is referring to an older mainframe he is aware of not being sql while being completely oblivious of all the government systems that are in sql.

        Which isn’t giving him any credit, because in that case he is atill running his mouth based on being ignorant about other government systems.

        I submitted data to a government database yesterday that I know for a fact is sql because we have had an ongoing years long relationship that involves improving that system and aligning our state level sql database. The government absolutely uses sql frequently, even if they still have older mainframes with some other database architecture.

        • r00ty@kbin.life
          link
          fedilink
          arrow-up
          1
          ·
          8 days ago

          The government absolutely uses sql frequently, even if they still have older mainframes with some other database architecture.

          This makes more sense. But even then they would surely transfer data from the old system over.

          I mean I’m liking the idea that they went down into the basement, started up an old mini computer, with “superman 3” magnetic tapes with data from the 1980s to force them to try to integrate with that and only after transferring the data at 1000cps, find out it’s entirely out of date.

          I mean, it won’t be the case, but I’d really like it to be. 😛

          • snooggums@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            8 days ago

            This makes more sense. But even then they would surely transfer data from the old system over.

            All you gotta do is snap your fingers!

            Moving data from system to system is a massive undertaking. It probably needs to be restructured, and decisions made during the process will be found to be imperfect and adjustments will need to be made along the way.

            Then you have to change all the connections to other systems and recreate the existing reports and by the way the changed structure impacts all of that and you need to revisit why you have all this stuff snd why don’t we just leave it alone after all.

            There is a reason that legacy systems stick around. I’m sure they have legacy mainframes with financial data. At my state office we have a financial mainframe we have been wanting to get rid of for over a decade and while we have peeled off what processes we can there is still a ton left to do. Nothing about it is easy compared to creating something new from scratch, in fact transitioning to a new system to replace an old system is probably ten times as much work. Not to mention you still have to use and maintain the old system the entire time!

  • GaMEChld@lemmy.world
    link
    fedilink
    arrow-up
    57
    arrow-down
    2
    ·
    8 days ago

    Because of course the government uses SQL. It’s as stupid as saying the government doesn’t use electricity or something equally stupid. The government is myriad agencies running myriad programs on myriad hardware with myriad people. My damned computers at home are using at least 2-3 SQL databases for some of the programs I run.

    SQL is damn near everywhere where data sets are found.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      8 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • GaMEChld@lemmy.world
        link
        fedilink
        arrow-up
        7
        arrow-down
        1
        ·
        8 days ago

        Oh, well another user pointed out that SSN’s are not unique, I think they are recycled after death or something. In any case, I do know that when the SSN system was first created it was created by people who said this is NOT MEANT to be treated as unique identifiers for our populace, and if it were it would be more comprehensive than an unsecure string of numbers that anyone can get their hands on. But lo and behold, we never created a proper solution and we ended up using SSN’s for identity purposes. Poop.

        • 【J】【u】【s】【t】【Z】@lemmy.world
          link
          fedilink
          arrow-up
          6
          arrow-down
          1
          ·
          edit-2
          8 days ago

          I’m pretty sure there is a federal statute that says ONLY the SSA may collect or use SSNs, as to federal agencies. I argued it once when a federal agency court tried to tell me that it couldn’t process part of my client’s case without it. I didn’t care but my client was crotchety and would only even give me the last four.

          Edit. It’s a regulation:

          https://www.law.cornell.edu/cfr/text/28/802.23

          An agency cannot require disclosure of an SSN for any right or benefit unless a specific federal statute requires it or the agency required the disclosure prior to 1975.

          In my case the agency got back to me with some federal statute that didn’t say what they said it said, and eventually they had to admit they were wrong.

      • aesthelete@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        edit-2
        8 days ago

        SSNs being duplicated would be entirely expected depending upon the table’s purpose. There are many forms of normalization in database tables.

        I mean just think about this a little bit, if the purpose is transactions or something and each row has a SSN reference in it for some reason, you’d have a duplicate SSN per transaction row.

        A tiny bit of learning SQL and you could easily see transactional totals grouped by SSN (using, get this, a group by clause). This shit is all 100% normal depending upon the normalization level of the schema. There are even – almost obviously – tradeoffs between fully normalizing data and being able to access it quickly. If I centralize the identities together and then always only put the reference id in a transactional table, every query that needs that information has to go join to it and the table can quickly become a dependency knot.

        There was a “member” table for instance in an IBM WebSphere schema that used to cause all kinds of problems, because every single record was technically a “member” so everything in the whole system had to join to it to do anything useful.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          1
          ·
          7 days ago

          had to join to it

          I don’t think I get what this means. As you describe it, that reference id sounds comparable to a pointer, and so there should be a quick look up when you need to de-reference it, but that hardly seems like a “dependency knot”?

          I feel like this is showing my own ignorance on the back end if databasing. Can you point me to references that explain this better?

          • aesthelete@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            7 days ago

            I’m talking about a SQL join. It’s essentially combining two tables into one set of query results and there are a number of different ways to do it.

            https://www.w3schools.com/sql/sql_join.asp

            Some joins are fast and some can be slow. It depends on a variety of different factors. But making every query require multiple joins to produce anything of use is usually pretty disastrous in real-life scenarios. That’s why one of the basics of schema design is that you usually normalize to what’s called third normal form for transactional tables, but reporting schemas are often even less normalized because that allows you to quickly put together reporting queries that don’t immediately run the database into the ground.

            DB normalization and normal forms are practically a known science, but practitioners (and sometimes DBAs) often have no clue that this stuff is relatively settled and sometimes even use a completely wrong normal form for what they are doing.

            https://en.m.wikipedia.org/wiki/Database_normalization

            In most software (setting aside well-written open source), the schema was put together by someone who didn’t even understand what normal form they were targeting or why they would target it. So the schema for one application will often be at varying forms of normalization, and schemas across different applications almost necessarily will have different normal forms within them even if they’re properly designed.

            All that said, detecting, grouping, comparing, and removing duplicates is a basic function of SQL. It’s definitely not expected that, for instance, database tables would never contain a duplicate reference to a SSN. Leon is indeed demonstrating here that he’s a complete idiot when it comes to databases. (And he goes a step further by saying the government doesn’t use SQL when it obviously does somewhere. SQL databases are so ubiquitous that just about any modern software package contains one.)

  • missingno@fedia.io
    link
    fedilink
    arrow-up
    98
    arrow-down
    4
    ·
    8 days ago

    Because SQL is everywhere. If Musk knew what it was, he would know that the government absolutely does use it.

    • credo@lemmy.world
      cake
      link
      fedilink
      arrow-up
      6
      arrow-down
      4
      ·
      edit-2
      8 days ago

      This explanation makes no sense in the context of OP’s question, given the order of comments…

      • finitebanjo@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        8 days ago

        Yeah, a better explanation is that Deduplicating Databases are an absolutely terrible idea for every use case, as it means deleting history from the database.

  • jj4211@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    8 days ago

    Frankly the whole exchange sounds like Hollywood tech jargon.vaguely relevant words used in a not quite sensible way…

  • snooggums@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    arrow-down
    2
    ·
    8 days ago

    If he doesn’t think the government uses sql after having his goons break into multiple government servers he is an idiot.

    If he is lying to cover his ass for fucking up so many things (the more likely explanation) then saying “he never used sql” is basically a dig at how technically inept he really is despite bragging about being a tech bro.

  • Honytawk@lemmy.zip
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    3
    ·
    7 days ago

    He is saying the US government doesn’t use structured databases.

    At least 90% of all databases have a structure.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      7 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • Sparking@lemm.ee
        link
        fedilink
        English
        arrow-up
        6
        ·
        7 days ago

        As someone explained in another comment, you often duplicate information due to rules around cardinality to gain improvements in retrieval an. structure. I would be pretty worried if SSSNs were being used as a a widepread primary key in any set of tables - those should generally be UUIDs that can be optimized for gashing while avoiding collisions.

        Even if we are being generous to Elon, we could assume that social security payments are processed on mainframes given how many have to go out and the legacy nature of the program. Most mainframe shops I know have adapted an SQL interface for records in some capacity, but who knows what he is looking at.

        Government federal IT is done at a per agency basis. I would say oracle database is pretty much the most licensed piece of software the government does use outside of Redhat Linux and windows desktop.

  • jacksilver@lemmy.world
    link
    fedilink
    arrow-up
    18
    ·
    8 days ago

    If SSNs are used as a primary key (a unique identifier for a row of data) then they’d have to be duplicated to be able to merge data together.

    However, even if they aren’t using ssn as an identifier as it’s sensitive information. It’s not uncommon to repeat data either for speed/performance sake, simplicity in table design, it’s in a lookup table, or you have disconnected tables.

    Having a value repeated doesn’t tell you anything about fraud risk, efficency, or really anything. Using it as the primary piece of evidence for a claim isn’t a strong arguement.

    • credo@lemmy.world
      cake
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      8 days ago

      This is the answer… it seems few on lemmy have ever normalized a database. But they do know how to give answers!

      • jacksilver@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        8 days ago

        Thanks, OP seemed more curious about the technical aspects than just the absurdity of the comment (since pretty much every business uses SQL) so hoped a more technical explanation might be appreciated.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      3
      ·
      8 days ago

      This sounds like a reasonable argument.

      Can you pass any resources with examples on when having duplicate values would be useful/best practices?

  • SolidShake@lemmy.world
    link
    fedilink
    arrow-up
    26
    ·
    8 days ago

    How come republicans keep saying that doggy is going to expose all the fraud in the government but yet the biggest fraud with 37 felonies is president? What the actual fuck to these people think?