I know Phones dont listen but....

padlock4995@lemmy.ml · edit-2 1 day ago

I know Phones dont listen but....

Neuromancer49@midwest.social · 1 day ago

No no, they listen. How do you think the “Hey Google” feature works? It has to listen for the key phrase. Might as well just listen to everything else.

I spent some time with a friend and his mother and spoke in Spanish for about two hours while YouTube was playing music. I had Spanish ads for 2 weeks after that.

ggppjj@lemmy.world · 1 day ago

Prove your extraordinary claim.

moody@lemmings.world · 1 day ago

Your phone listens for the phrase “Hey Google” and uses little processing power to do so. If it was listening to everything and processing that information, your battery would die incredibly fast. We’re talking charging your phone multiple times a day even if you weren’t using it for anything else.

As someone else mentioned in another commend, being near Spanish speakers’ phones, Bluetooth/Wifi tracking are what Google is using to track you. They search Google in Spanish, Google can tell you spend time with them, Google thinks you speak Spanish.

Neuromancer49@midwest.social · 1 day ago

Well shit. That makes a lot of sense.

wintermute@discuss.tchncs.de · 1 day ago

Exactly. Phones have dedicated hardware that stores the trigger word and wakes up the OS when it detects it.

tetris11@lemmy.ml · 1 day ago

Your phone listens for the phrase “Hey Google” and uses little processing power to do so.

I need some metrics on this. It must be recording at least some things above a certain volume threshold in order to process them.

moody@lemmings.world · 1 day ago

I mean the microphone is active, so it’s listening, but it’s not recording/saving/processing anything until it hears the trigger phrase.

The truth is they really don’t need to. They track you in so many other ways that actually recording you would be pointless AND risky. While most people don’t quite grasp digital privacy and Google can get away with a lot because of it, they do understand actual eavesdropping and probably wouldn’t stand all their private moments being recorded.

tetris11@lemmy.ml · 1 day ago

so it’s listening, but it’s not recording/saving/processing anything until it hears the trigger phrase.

I think this is the part I hold issue with. How can you catch the right fish, unless you’re routinely casting your fishing net?

I agree that the processing/battery cost of this process is small, but I do think that they’re not just throwing away the other fish, but putting them into specific baskets.

I hold no issue with the rest of your comment

Onihikage@beehaw.org · edit-2 12 hours ago

How can you catch the right fish, unless you’re routinely casting your fishing net?

It’s a technique called Keyword Spotting (KWS). https://en.wikipedia.org/wiki/Keyword_spotting

This uses a tiny speech recognition model that’s trained on very specific words or phrases which are (usually) distinct from general conversation. The model being so small makes it extremely optimized even before any optimization steps like quantization, requiring very little computation to process the audio stream to detect whether the keyword has been spoken. Here’s a 2021 paper where a team of researchers optimized a KWS to use just 251uJ (0.00007 milliwatt-hours) per inference: https://arxiv.org/pdf/2111.04988

The small size of the KWS model, required for the low power consumption, means it alone can’t be used to listen in on conversations, it outright doesn’t understand anything other than what it’s been trained to identify. This is also why you usually can’t customize the keyword to just anything, but one of a limited set of words or phrases.

This all means that if you’re ever given an option for completely custom wake phrases, you can be reasonably sure that device is running full speech detection on everything it hears. This is where a smart TV or Amazon Alexa, which are plugged in, have a lot more freedom to listen as much as they want with as complex of a model as they want. High-quality speech-to-text apps like FUTO Voice Input run locally on just about any modern smartphone, so something like a Roku TV can definitely do it.

tetris11@lemmy.ml · edit-2 10 hours ago

I appreciate the links, but these are all about how to efficiently process an audio sample for a signal of choice.

My question is, how often is audio sampled from the vicinity to allow such processing to happen.

Given the near-immediate response of “Hey Google”, I would guess once or twice a second.

Ethalis@jlai.lu · 1 day ago

The amount of processing power that would be needed to listen the output of billions of devices 24/7 just to push ads wouldn’t make economic sense.

The Doctor@beehaw.org · 1 day ago

AI acceleration ASICs are already in a lot of hardware these days. It doesn’t take a whole lot anymore for it to be both cheap and feasible.

joel_feila@lemmy.world · 23 hours ago

Well neither dies the cost of llm but that’s bit stopping them

TimeSquirrel@kbin.melroy.org · 1 day ago

This stuff isn’t magic. It’s tech. These things can be proved by analyzing network traffic.

The Doctor@beehaw.org · 1 day ago

It would be pretty easy to test, too.

Get a pre-paid phone. Set up a brand-new Google or Apple account. Activate phone using the new account. Put it through its paces for a few hours and note the ads you get.

Shoot the shit with your friends and family with the phone on the table for a few hours.

Put the phone through its paces again and note the ads you get.