Authentic8 whitepaper: Why a virtual browser is important for your enterprise

The web browser has become the defacto universal user applications interface. It is the mechanism of choice for accessing modern software and services. But because of this ubiquity, it puts a burden on browsers to handle security more carefully.

silo admin console2Because more malware enters via the browser than any other place across the typical network, enterprises are looking for alternatives to the standard browsers. In this white paper that I wrote for Authentic8, makers of the Silo browser (their console is shown here), I talk about some of the issues involved and benefits of using virtual browsers. These tools offer some kind of sandboxing protection to keep malware and infections from spreading across the endpoint computer. This means any web content can’t easily reach the actual endpoint device that is being used to surf the web, so even if it is infected it can be more readily contained.

Why the original Soviet Internet failed

I am reminded today about the cold war with next week being the 30th anniversary of the Chernobyl disaster. But leading up to this unfortunate event were a series of activities during the 1950s and 1960s where we were in a race with the Soviets to produce nuclear weapons, launch manned space vehicles, and create other new technologies. We also were in competition to develop the beginnings of the underlying technology for what would become today’s Internet.

One effort succeeded thanks to well-managed state subsidies and collaborative research that worked closely with a centrally planning authority. The other failed largely because of unregulated competition that was stymied by a variety of self-interests. Ironically, we acted like the socialists and the Soviets acted like the capitalists.

While the origins of the American Internet are well documented, until now there has been little published research into those early Soviet efforts. A new book from Benjamin Peters, a professor at the University of Tulsa, called How Not to Network a Nation seeks to rectify this. While a fairly dry read, it nevertheless is fascinating to see how the historical context unfolded and how the Soviets missed out on being at the forefront of Internet developments, despite early leads in rocketry and computer science.

It wasn’t from lack of effort. From the 1950s onward, a small group of Soviet scientists tried to develop a national computer network. They came close on three separate times, but failed each time. Meanwhile, the progenitor of the Internet, ARPANET, was established in late 1959 in the US and that became the basis for the technology we use every day.

Ultimately the Soviet-style “command economy” proved inflexible and eventually imploded. Instead of being a utopian vision of the common man, it gave us quirky cars like the Lada and a space station Mir that looked like something built out of spare parts.

The Soviets had trouble mainly because of a disconnect between their civilian and military economies. The military didn’t understand how to marshall and manage resources for civilian projects. And when it came time to deal with superstar scientists from its army, they faltered when deciding on proposed civilian projects.

Interestingly, those Soviet efforts at constructing the Internet could have become groundbreaking, had they moved forward. One was the precursor to cloud computing, another was an early digital computer. Both of these efforts were ultimately squashed by their bureaucracies, and you know how the story goes from there. What is more remarkable is that this early computer was Europe’s first, built in an old monastery that didn’t even have indoor plumbing.

Almost a year after ARPANET was created, the Soviets had a meeting to approve their own national computing network. Certainly, having the US ahead of them increased their interest. But they tabled the idea from Vicktor Glushkov and it died in committee. It was bad timing: two of the leaders of the Politburo were absent that day they considered the proposal.

Another leading light was Anatoly Kitov, who proposed in 1959 that civilian economists use computers to solve economic problems. For his efforts, he was dismissed from the army and put through a show trial. Yet during the 1950s the Soviet military had long-distance computer networks and in the 1960s they had local area networks. What they didn’t have were standard protocols and interoperable computers. It wasn’t the technology, but the people, that stopped their development of these projects.

What Peters shows is that the lessons from the failed Soviet Internet (he adoringly calls it the ‘InterNyet’) has to do more with the underlying actors and the intended social consequences than any lack of their combined technical skill. Every step along the route he charts in his book shows that some failure of one or more organizations held back the Soviet Internet from flourishing like it did here in the States. Memos got lost in the mail, decisions were deferred, committees fought over jurisdiction, and so forth. These mundane reasons prevented the Soviet Internet from going anywhere.

You can pre-order the book from Amazon here.

The Evolution of today’s enterprise applications

Enterprises are changing the way they deliver their services, build their enterprise IT architectures and select and deploy their computing systems. These changes are needed, not just to stay current with technology, but also to enable businesses to innovate and grow and surpass their competitors.

In the old days, corporate IT departments built networks and data centers that supported computing monocultures of servers, desktops and routers, all of which was owned, specified, and maintained by the company. Those days are over, and now how you deploy your technologies is critical, what one writer calls “the post-cloud future.” Now we have companies who deliver their IT infrastructure completely from the cloud and don’t own much of anything. IT has moved to being more of a renter than a real estate baron. The raised-floor data center has given way to just a pipe connecting a corporation to the Internet. At the same time, the typical endpoint computing device has gone from a desktop or laptop computer to a tablet or smartphone, often purchased by the end user, who expects his or her IT department to support this choice. The actual device itself has become almost irrelevant, whatever its operating system and form factor.

At the same time, the typical enterprise application has evolved from something that was tested and assembled by an IT department to something that can readily be downloaded and installed at will. This frees IT departments from having to invest time in their “nanny state” approach in tracking which users are running what applications on which endpoints. Instead, they can use these staffers to improve their apps and benefit their business directly. The days when users had to wait on their IT departments to finish a requirements analysis study or go through a lengthy approvals process are firmly in the past. Today, users want their apps here and now. Forget about months: minutes count!

There are big implications for today’s IT departments. To make this new era of on-demand IT work, businesses have to change the way they deliver IT services. They need to make use of some if not all of the following elements:

  • Applications now have Web-front ends, and can be accessed anywhere with a smartphone and a browser. This also means acknowledging that the workday is now 24×7, and users will work with whatever device and whenever and wherever they feel the most productive.
  • Applications have intuitive interfaces: no manuals or training should be necessary. Users don’t want to wait on their IT department for their apps to be activated, on-boarded, installed, or supported.
  • Network latency matters a lot. Users need the fastest possible response times and are going to be running their apps across the globe. IT has to design their Internet access accordingly.
  • Security is built into each app, rather than by defining and protecting a network perimeter.
  • IT staffs will have to evolve away from installing servers and towards managing integrations, provisioning services and negotiating vendor relationships. They will have to examine business processes from a wider lens and understand how their collection of apps will play in this new arena.

 

Network World: Five cloud costing tools reviewed

Certainly, using a cloud provider can be cheaper than purchasing your own hardware, or instrumental in moving a capital expense into an operating one. And there are impressive multi-core hyperscale servers that are now available to anyone for a reasonable monthly fee. But while it is great that cloud providers base their fees on what resources you actually consume, the various elements of your bill are daunting and complex, to say the least.

Separating pricing fact from fiction isn’t easy. For this article, we looked at five shopping comparison services, including Cloudorado, CloudHarmony’s CloudSquare, CloudSpectator, Datapipe and RightScale’s PlanForCloud.com. Some of them cover a lot of providers, some only focus on a few.

You can read the full review in Network World today here.

Why you need to review your stats regularly

I admit it; I have fallen out of the habit of reviewing my various stats on my websites and other content-oriented places. For many years I dutifully kept track of how my posts were doing, who was commenting, where backlinks were coming from, and so forth.

For some reason, I stopped doing this in the past year. Maybe it was just being lazy, maybe because I had gotten very busy with a lot of very interesting assignments. Maybe it was just old age: I have been writing stuff for more than 25 years, after all.

Well, all of those (and others) aren’t valid excuses. You need to check your stats, and check them regularly. There are lots of interesting things hidden in them that you might not realize, and some of these things can help you delivery better content, or target new audiences, or figure out what you are doing right (and do it more often) or wrong (and avoid or improve).

WordPress’ Jetpack delivers an annual email summary of your blog and its posts: this is a very useful reminder that you need to dive in deeper and see what is going on with your blog (or blogs, in my case). And Slideshare.net also has some great analytics. This service is a wonderful place to post PowerPoints of my presentations. Looking at these analytics, I would have found out:

  • Influence can be found in odd places. A post that I wrote for SoftwareAdvice.com about real-time retail store tracking was picked up by a blogger for the point-of-sale system Vend.com, that brought a bunch of visitors to my site back in the spring when I was quoted by their blogger. Could have been an opportunity to talk more about the subject.
  • Don’t knock the long tail. I am still the leading expert on a very obscure Windows error message: if you were to Google “Windows Media Player error c00d11b1” you will see my post in the first ten or so results. The post has received more than 380,000 views in the more than eight years since I wrote it, and it is still getting comments on my blog and links in the Microsoft forums too. Why is this important? All this traffic on a very specific subject can help raise your Google ranking, and also provide an entry point into your content ecosystem if you manage it properly.
  • My influence beyond North American borders is somewhat quirky. The second most-visited place of origin for my Slideshare.net account is Ukraine, with about half the views from the US over the past year. Again attesting to the very long tail, a good chunk of these views came from a presentation that I posted five years ago on how to set up your first blog and business email. (That kind of makes sense.) For my blog, other popular countries of origin for my visitors were India and Brazil. Don’t forget the rest of the world when you are posting your content and widen your perspective to engage more of these readers.
  • Twitter and Facebook were both important traffic drivers for my blog over the past year. This emphasizes how critical your own social media accounts are and how you need to cross-link posts among them. Combined the two were equal to the traffic brought in from Google organic searches, which is another important element in referring traffic too. Don’t just post blog entries on your blog: I have begun cross-posting my content on LinkedIn Pulse and Medium and they are getting a fair share of views there too. The analytics for those sites could be better though: for example, Medium only allows you to look at month-long intervals at a time and Pulse will only send you static results in regular email summaries.

There are lots of Twitter analytic tools, and some that are quite pricey. One that I like that has a free version is TwitterCounter, including who unfollowed and followed you over time. For example, I got excited this week to see that the actor Taye Diggs followed me (he has been following my wife’s Tweets for some time) but our local mayor dropped me (oh well). You can see the kind of graphs it produces such as this, which to me indicate a fairly steady stream of new followers replacing the drop-offs with a slow overall growth:

twittt

Happy new year and may your stats encourage you to deliver better content in 2015!

Everyone is in the software business

Everyone is in the software business You may not know it, but you are in the software business, no matter what your actual business may appear to be. It doesn’t matter what you produce, whether you are a “bricks and mortar” retailer or a “guys in trucks” distributor, software is where you are going to end up.

virgin-america-logo-1Why is everyone in the software business? Simply because software is become the lifeblood of so many decisions on what a business makes, how it is sold, and how customers are kept happy. All the interesting business operations are happening inside your company’s software. Software is where you can find out if your customers are going elsewhere, if your profits are coming from some new markets, and if your employees are helping or hurting your overall reputation.

As an example, the airline Virgin America has billions of dollars invested in planes and the people that fly them, but their brand lives and thrives based on their mobile and web experience. That experience is all because of the airline’s booking software, and understanding what is happening with that software will make the difference between success and failure for the airline.

Take as another example a national food service distributor. Their business is getting food into trucks, and then getting those trucks to restaurants and other institutional caterers and retail kitchens. The company has had an ecommerce business for the past decade, and a pretty significant one at that. But lately their customers have been shifting their emphasis from calling their sales representatives with the weekly orders and wanting to do more online. The distributor needed to scale up their online business, and also be more data-driven. Rather than letting their truck drivers or regional offices make decisions about distribution, they wanted a single view of their business, and use the changes in their orders and other data to fine-tune their deliveries. This food distributor is now firmly into the software business.

What is driving everyone to software? Several things.

  • The cloud. The days where you had to build your own servers and data centers are over. Post Holdings is probably the largest cloud-only company, and their revenues are in the billions.
  • Everyone wants an online storefront. Just like the food distributor, even the most basic industries are finding out there is value is selling their stuff online.
  • Big Data is getting more familiar. You can now find Hadoop clusters in many traditional Fortune 500 companies. The IT staffers at the giant retailer Sears eventually spun off a side business in helping others get started with Big Data.
  •  Customer experience is king. One way that businesses can differentiate themselves is by paying attention to their customers. This isn’t anything new: Nordstrom’s department stores have been doing this for decades. What is new is a range of software tools to help figure this stuff out.

I will have more to say about this topic, right now I am working on a white paper for a client that will dive into this deeper. Check back here in the fall when I can post a link to it.

New Relic blog: JQuery Foundation’s Dave Methvin Shares his Rules of the Road for Speeding Up Your Website

Haven’t you noticed that just about everyone’s website is always too slow when you are browsing it? It seems like a universal truth. Nevertheless, if you spend some time with Dave Methvin, you can learn a lot about how to speed things up. Methvin is an old friend: we worked together back in the 1980s at PC Week (which is now called eWeek), since then he has built a few technical businesses and now is the president of the jQuery Foundation, the central repository and authoritative source for code for that particular language that is used in many websites. His recent speech about “The Web Doesn’t Have to Be Slow” at PraireCon can be found here on Slideshare.

Part of understanding how to speed up your site is in understanding how the browser and various Web servers work together to present a page to you. “A lot of times when people are doing performance testing, they start at the wrong end of the problem,” he told me in an interview. “They are measuring how fast it takes to do lots of loops, but they aren’t looking for bottlenecks in their code.” This is a common theme in performance testing. “We were doing this 25 years ago at PC Week when we were measuring network latencies, and people still fall into the same traps today,” he said. Think of this as asking the question of whether you want to run or walk to get to your car before you go on a long road trip. “It really doesn’t matter in the long run, because ultimately you are going to be in your car and driving for a lot longer than the amount of time you might save getting to it in front of your house.”

This means it is important to scrutinize everything on your pages and understand how each line can impact your performance. “You have to look carefully at when you are time-bound or causing something that is going to have a big effect like this,” he said. However, “with all the other things going on on your page, it is unlikely that the running time of your Javascript is the worst of your problems. But there are things you can do to make your Javascript code perform better,” he said. He calls these his “rules of the road” (see screen shot) to help you make browsing requests across the network more efficient for your pages and to eliminate tany obvious bottlenecks. These include:

Avoid 3xx requests. These penalize your pages because you introduce another round-trip to the time it takes you to load the referral page. While sometimes this can’t be avoided, you should do what you can to keep these referrals to a minimum. “The round trip times across the Internet can kill you and can degrade your performance,” he says.

Start requests early on your pages. Put requests for external resources such as images at the top of your page whenever possible, so this will be the first bits delivered to your browsers. This will also help with prefetching images for subsequent loads too, which should also be part of your HTML coding.

Minimize the number of overall requests and the resulting bytes.  Combine similar resources such as images or Javascript and download them together. Here you have to weigh the tradeoffs of combining these elements to take advantage of any browser caching versus using something like a content distribution network. You should also use tools such as CSSMin and Uglify.js to squeeze out any unneeded bits in your files to keep them small. Also most of you probably know to compress images or reduce their resolution and enable gzip compression on your servers too.

Maximize browser caching. Make sure your caching defaults are set up properly so that stable content (like your corporate logos or other frequently-used and infrequently-changed items) are specified accurately in your pages. This helps to take maximum advantage of browser caching. The less you have to reload, the faster your pages will appear.

Use domain sharding. If you find yourself loading a bunch of resources from a single domain you can take advantage of this technique. The idea is to spread requests across several domains at once to maximize bandwidth, since you are downloading elements in parallel rather than in series. This is a technique that many of the top trafficked websites use to boost their performance. Another way to do this is to use a content distribution network to help. One example is to use external jQuery libraries that are already pre-cached by your browser. Here is a great discussion of why you should do this, along with code snippets of how to use Google’s content delivery network for free. Because of different versions of jQuery, you can spend a lot of unnecessary browser time in loading these versions from your own site. This is just one or two lines of code and it can have a big impact on your page load times.  Along those lines, here is a sample plug-in to help speed up your WordPress site too.

Load non-critical stuff later. You should code your pages to wait near the end to load things that aren’t critical to viewing the page, such as content not initially visible, social media scripting tools, ads, and page analytics.

Another common coding mistake is to code into your pages browser version or client detection so you can push out particular features that depend on more modern browsers. This is decidedly old school and there are better ways, such as using tools like Modernizr that is a JavaScript library that can be used for this purpose. “Don’t try to detect which browser your visitors are using, that can really backfire when the browser changes in the future,” he said. Here is a great place to start to learn about this tool.

“This is a big topic, just like fixing cars – the more you do it, the more you understand the depth and dimensions of your problems,” he said.  “But at least following some of my suggestions you can be sure that the resources you put into your browsers are in the best order and optimized for the best possible viewing.”

Why New Jersey needs to slow down its traffic

It is ironic that the same thing that got Chris Christy in trouble – delaying traffic into Manhattan — is being used by others to build a multi-million business. I am talking about network traffic for stock traders. Perhaps you have seen the stories about them based on Michael Lewis’ latest book Flash Boys.

In the celebrated George Washington “bridgegate” fracas, it was delays introduced by closing various on-ramps to the bridge. In the case of the stock market, it was delays of 350 microseconds that made high-frequency trades more equitable across exchanges.

Wait a minute, come again? Adding in latency is a good thing? Yes, that is another irony of the situation. A company called IEX developed a technique where they slow down the trades so that the exchanges can get the trading requests at the same time. This means that exchanges closer (in terms of connection time) to the Big Apple trading desks can’t trade a few microseconds ahead of the others. The technique IEX developed is basically a big spool of fiber optic cable that is 35 miles long (pictured above), the length it takes a beam of light to traverse 350 microseconds. The problem is that the biggest trading exchanges are located at different peering points in New Jersey, with some being up to 35 miles apart from others, at least in terms of how their packets are routed. Another irony: these locations were chosen so that the exchanges would have as little latency as possible to gain access to the trading data streams (see the map from the New York Times below).

It is probably the first time that anyone was deliberately introducing more network latency to improve their business that I have heard of. Many of us have spent a good chunk of our careers trying to cut down on latency issues: I can remember when I rolled out the first PC local area network application at MegaLith Insurance back in the middle 1980s. The file transfer app that normally took a few seconds to get from one point to another across our mainframe network now took tens of minutes. That wasn’t a good thing.

Programming professor Donald Knuth wrote in one of his seminal books how “programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs.” (Thanks to Dave Methvin of jQuery Foundation for uncovering this quote.) And in the world of high-frequency trading, a company called Spread Networks invested hundreds of millions of dollars building a fiber connection from Chicago to NYC a few years ago, only to find that a microwave network could shave even more time off their latency figures.

So we have the Internet to thank for introducing all kinds of unpredictable latencies into our apps and drive us nuts trying to track down the culprit. Now lower latency has been productized, thanks to the smart guys at IEX. Maybe the next step will be for the traders to actually colocate their offices on top of the major peering points themselves: that could actually work for cutting down on the times of some New Jersey commuters, too.

St. Louis Trains Hundreds of Coders

jimck“Have you ever had the opportunity to work with someone who is the best in the world?’ That question got at the heart of a presentation from Jim McKelvey last night at a rather unusual event that I attended at our newly renovated central library downtown. I’ll get to Jim in a moment, but first I want to tell you the context of the event.

Here in St. Louis, like many areas of the world, we have a coding shortage. There are dozens of companies, some big and some just getting started, that can’t hire good programmers. It isn’t from lack of trying, or resources: they have the money, the open positions, and the need. The problem in the past has been explained that either they can’t find them or don’t know where to look. But there is a third possibility: the coders exist, they just need some training to get started. That is where an effort called LaunchCode comes into play.

For the past several weeks, hundreds of folks have been taking the beginning computer science programming class, CS50 that Harvard offers over the EdX online platform. The class started with more than a thousand participants and is now down to about 300 or so hardy souls who spend anywhere from 20 hours per week or more learning how to code. Each week they gather in our library to listen to the lectures and work together on the various programming problem sets.

David Malan, who went to Harvard himself and is a rockstar teacher, teaches the course. I watched a couple of his lectures and found them interesting and engaging, even when he covers some basic concepts that I have long known. If I had him teaching me programming back in the day, I might have stuck with it and become a coder myself.

The CS50.tv collection online is pretty amazingly complete: there are scans of the handouts, quizzes, problem tests, additional readings, supplemental lectures and so forth. The courseware is very solidly organized and designed and very impressive, from my short time spend looking around.

But here is the problem: while the online class is fantastic, only one percent of the people who take the class complete it satisfactorily. That is almost a mirror image of the completion rate for those attending in-person at the Harvard campus, where 99% of the students finish. I was surprised at those numbers, because Malan goes quickly through his lectures. You have to stop and rewind them frequently to catch what he is doing.

This is where LaunchCode comes into play. The operation, which is an all-volunteer effort, is trying to short-circuit the coder hiring process by pairing the students who complete the course with experienced programmers in one of more than a 100 target tech companies who are looking for talent. They think of what they are doing as going around the traditional HR process and building a solid local talent pool. It is a great idea. I spoke to a few students, many of who come from technical backgrounds but who don’t have current coding experience. They are finding the class challenging but doable.

LaunchCode is also supplementing the CS50 lectures and online courseware with meatspace assistance. They have space reserved downtown for the students to get together and help each other. Some students have actually moved to St. Louis so they could take the class here: that was pretty amazing! LaunchCode has created mailing lists and Reddit forums where students can share ideas. But that isn’t enough, and last night we learned that Malan is coming to town in a few weeks, bring a dozen of his teaching assistants with him for a special evening hackathon for the class participants. Wow. Will that help get more students to finish the class? I hope so, because I want Malan & Co. to make a regular trip here to see the next class, and the next.

The problem with teaching programming is that you have to just do it to become good at it. No amount of academic study is going to help you understand how to parse algorithms, debug your code, figure out what pieces of the puzzle you need and how to organize them in such a way to make more efficient code. You just have to go do “build something” as McKelvey told us all last night.

Back to his question posed at the top of my post. Obviously, he thinks Malan is the best programming teacher in the world. He challenged everyone in the auditorium to think about what questions they would ask Malan when he comes into town, and how they can leverage their time with the master. He used the analogy of when he built his glassblowing studio he was able to spend time with Lino Talgiapietra, a master Venetian glassblower. Last night he once again told the story of how humbling an experience that was and how he was allowed to only ask a single question of the “maestro.” Wow.

McKelvey was very gracious with his time, and answered lots of questions from the LaunchCode students. Many of the questions last night were how the students were going to position themselves to get a coding job once the class was over in a few weeks. McKelvey kept emphasizing that they need to just “rock the class” and not worry about whether they were going to be programming in php or Ruby. “That isn’t important,” he kept saying: just demonstrate to Malan that they could write the best possible code when he comes here in a few weeks.

I have heard McKelvey speak before and last night he was in fine form. Will LaunchCode succeed at seeding lots of beginning coders? Only time will tell. But my hat is off to them for trying an very unconventional approach, and I hope it works.

Comments always welcome here:

Stop Web Scraping With ScrapeDefender

Copying content from the Web can be both a good and bad thing. There are companies that make it easy to scrape public data archives such as ScraperWiki.org that are used by data sciences and journalists to track trends and uncover government abuses. And Google and other search engines use various kinds of scraping algorithms to index and categorize your site, and to ensure that your content is ranked appropriately.

But for the most part scraping is bad news. Chances are good that someone has copied your Web content and is hosting it as their own elsewhere online. This happened with LinkedIn not too long agoscrape dashboard2, where someone picked up thousands of personal profiles to use for their own recruiting purposes. That is a scary thought, indeed.

And lest you think this is difficult to do, there are numerous automated scraping tools that make it easy for anyone to collect content from anywhere, including Mozenda and Scapebox. I won’t get into whether it is ethical to use these on a site that you don’t own the content. Some of these attack sites are very clever in how they go about their scraping, with massive numbers of ever-changing IP addresses blocks to obtain their content.

So what can you do to prevent the bad kind of scraping? There are several companies that try to protect your site from being scraped by a bad actor, including Distill Networks and CloudFlare’s ScrapeShield.

But today’s post is to tell you about another one that goes even further than these two tools called ScrapeDefender. You can watch a screencast video that I just produced here that shows its features.

Scrape Defender is easy to get started with: you just plug in your site’s URL and it will take about a day to look at your site and see where you are vulnerable. When I tried it with my own domain strom.com I was surprised to see it listed 150 different exploits. Some of them have pretty oddball names, such as dripping water or shotgun that show where anyone can come in and grab your content. The service provides a piece of Javascript tracking code that you add to each of your site’s page headers. Once this is in place you can monitor what is going in in near-real time and protect your site against these abusers.

For example, you can view how many pages a potential abusive IP address has visited, any geolocation information, which risk metrics were tripped, what alarms were generated because of this activity and other IP addresses that are owned by the same organization. All that information can help you figure out if your site was suddenly very popular or was being targeted by one of your competitors or someone that wants to steal your content. Their service is Web-based; you bring up your browser and can view these metrics and reports, along with suggestions on best security practice to defend your content too.

The hard part about defending and hardening your site against potential scrapers is that it is difficult to distinguish between a legitimate visitor and an automated bot that is collecting your content. That is the secret sauce of ScrapeDefender: they have looked at thousands of websites to figure out when a bad actor is present, and have code these various behaviors into their system.

You can try Scrape Defender for free, the paid service starts at $79 per month to keep track of a single domain, with more expensive and extensive plans available. It is well worth a look.