While the gathering and use of this data by companies is something that we really should be thinking through and being appropriately concerned about, that's not the biggest problem. I've had these conversations with customers in the past - law enforcement, government agencies, and run-of-the-mill corporations - and the mantra is always that it's best to collect as much data as you can now and store it. You may not know how to properly classify it right now, you may not know exactly what to do with it, you may not have the compute resources to utilize it, and you may not even really know what you've collected. But if you have it, you can always come back to it. If you don't collect and store a telemetry datum when it happens, the moment is over, the datum is gone, and there's no way to recreate it. Once data is collected and stored, the collector will find some way to utilize that data. I had the person in charge of building out the ALPR system for a major metro police department tell me that he wanted to collect and keep as much geotagged data as they could possibly find grant money for so that if a crime happened, they could question every car that drove past; if a child went missing - because it's always about the children and isn't it worth it if it saves just one life!? - they could find everyone in the vicinity; if they knew a drug shipment was coming in from a city to the west every Thursday, they could run down the owner of every car that was driving eastbound into the city on that highway. In other words, it was about setting up a giant fishing net and bringing all the fish up onto the deck of the boat for a close inspection. And we all know that the police and prosecutors absolutely never get fixated on a suspect to the point where they stop looking at other suspects and just try to go for the "win".
Once this data is collected, there will be someone who comes along and offers to buy those records. It might be in an "anonymized" fashion, it might be a processed subset of data, or it might be just the raw collection. Even when the data has been "anonymized", if you have a large enough dataset and enough compute power - which is cheaply available to anyone, really - enough correlations can be found and tracked through that data to be able to confidently identify an individual. So imagine you're a company who has bought a bunch of data like that - vehicle data recorder information, phone tracking information, fitness tracker data, social media tracking, website visit tracking information, and so on - you could stitch all that together to build an incredibly invasive profile of the life of pretty much any random individual. That information might include where you go, how you drive, what medical conditions you have, what kind of food you eat, what kind of entertainment you like, what kind of people you associate with, what your writing style is like, and more. Don't forget that the whole theory behind big data is that with a large enough dataset, we can make computational correlations between different, seemingly unrelated characteristics or datapoints that allow us to very accurately predict outcomes. If Target can accurately identify a pregnancy based just on what one person is buying in their store, imagine what we could compute out of a dataset that includes every telemetry point that you generate over a year.
And now, assume you're the company that has built that profile of millions of individuals; what do you do with that? You've got to make money, right? So time to sell that information off. How about if I got a new customer to my hot dog stand, enrolled them in my loyalty program, and then I could pay you to give me the names of the friends of that person who also happen to walk by my hot dog stand every day, and what time, so I can market to them? That's not too invasive, right? But the police and government three-letter-agencies have figured out that they don't think a warrant is required to go off and purchase commercially-available information. So why not go any buy the life story of every person who might be connected in some way to a crime? Everybody who was at the scene. Maybe even the people who spent time around the people at the scene. Why restrict yourself to a crime? Maybe you're interested in stopping the crime before it happens. Why not use that to start investigating anyone who happens to be in the vicinity of a protest of some sort? And the people that they associate with. Maybe you'll find a criminal element that is at the heart of this so-called protest.
Okay, I get it. "I don't have anything to hide". "My life isn't that interesting". "Nobody cares about what I'm doing". Sure, that's probably true. But we've spent a lot of time and court cases because we, as a society, thought that it wasn't a good idea to allow police to get a record of what books you've read at the library without a warrant. Or to make religious organizations provide lists of members. Or to open up someone's mail. Or to record someone's private conversations. Or to even fly helicopters equipped with FLIR over houses to look for grow operations. Having access to process this type of data makes those methods of data collection look like tinkertoys.