Hi, this is Wayne again with a topic “How we test so much hardware – MarkBench Automated Benchmarking Tool”.
This has happened more often than I care to think about, and this time of year is particularly bad. I mean. Obviously, we want to cover all the new phones and CPUs and gpus, and I mean we’ve got this incredible video coming where we bought 20 used mining gpus, so we could find out once and for all if they’re safe to buy. But we’ve also got a corporate mandate to maintain healthy work-life balance for our team, so who’s going to actually do all of this testing meet the subject of today’s video mark bench or well rather. Workbench Mark here is an automated benchmarking tool that our Labs team has been cooking up for the last six months. It’S still early days, but even now Mark is able to improve our test efficiency by some percent and in time we intend to make it freely available. For personal use to our community, so let’s take a deeper dive together and maybe you guys can tell us what you think is good and also give us your feedback about what you’d like to see us work on, as we continue Mark’s development and continue to tell You about our sponsor simplemdm: they provide ridiculously simple Apple device management for it. Enrolling your company’s Apple devices and keeping them up to date doesn’t have to be frustrating. Try it for free for 30 days on unlimited devices at simplemdm.com Linus. Let’S begin with some napkin math to explain why we decided that it was finally time to bite the bullet and build markbench, we’ll use that upcoming mining GPU video. As our example, each of those cards will be subjected to 12 different benchmarks to ensure that it is free of strange performance anomalies. Let’S say optimistically that the 12 benchmarks take three minutes each that’s 36 minutes factor in that each test runs five times now, we’re up to three hours, plus half an hour of thermal stress testing that lands you at about three and a half hours per card, multiplied By 24 cards, 20 from eBay and four lightly used control cards, and that is 84 hours of testing and that doesn’t even account for reinstalling drivers, swapping cards or taking bio breaks. So it’s pretty clear that, even if we could just pound back Red Bulls and power through it, that is not the kind of thing that we’d want to do regularly and do it regularly is basically in our job description.
Oh yeah not like that. I mean to say that it is techtober and our corporate Overlord seem to get their jollies out of scheduling product releases to cause us much inconvenience as possible. I’M pretty sure the target is actually each other, but retailers, media and consumers definitely end up getting caught in the crossfire to varying degrees. So if we want to keep up the ants answer is Automation and while Mark doesn’t look like much at the moment, everything has to start somewhere.
So for now, mark bench is a golang GUI with a python framework that collects all the sensor and Frame data. That is output from our system during each test. He collects this data using present Mon and Libra hardware. Monitor presentmon is a tool for collecting Frame data and is actually the basis of nvidia’s frameview software and, as for Libra hardware, monitor it’s an open source Fork of open hardware monitor which gives us access to all of the sensors in our system you know, fan RPMs Cpu GPU temperature power consumption stuff, like that after a test is finished, our python framework, outputs, the data in the form of csvs, then converts those into protobufs a smaller binary format.
The data gets uploaded to a local ingest server before being sent to our Cloud, hosted. Postgres database in layman’s terms, the labs team builds, what’s called a harness for every game. We want to test then using scripts Mark adjusts the settings of the game, launches the game, loads up a benchmark and then records all of the relevant data. While The Benchmark is running and stores it in a database, rinse and repeat until all the benchmarks are done and we can swap off the card and put on the next one now, obviously we could get a similar level of automation using commercial software like 3dmark bit. Conveniently already exists, but scripting automation into real games has a few major benefits for you. The consumer.
First up, while a single bigger is better number is convenient. It really doesn’t tell the full story: take Intel’s Arc a750, for example. It might actually perform well on average, compared to say nvidia’s RTX 3060, but if your main game is CS go you are not going to be happy with that purchase, which leads perfectly to reason. Number two: a collection of individual game benchmarks, allows you to focus on what matters most to you.
Geekbench, for example, contains cryptography tests that heavily influence the final score, but yet have very little bearing on how most people will actually use the products being tested. It’S so bad that it’s often dismissed as kinda Irrelevant in media circles, even though it does also contain tests that are perfectly valid. Finally, markbench is a great way to keep manufacturers honest everyone from Samsung to Volkswagen has been caught, cheating on standardized synthetic tests to make their products look better than they are so by giving ourselves the option to run any number of different real games, all of which Will automatically be updated with new patches that would be hard to optimize, for we are making it extremely impractical to try to game the system and artificially Elevate test scores. I mean unless they just want to optimize their product for real games, in which case well. That’S not really cheating and we all win them right.
Once we’ve got our juicy data. We use grafana to transform it into nice, pretty graphs for your viewing pleasure or well at least that’s the plan. We still have a lot of work to do, as some of you have helpfully pointed out on automating our data visualization, because, depending on what we’re trying to convey, it can be really challenging to quickly and effectively present this much data. It’S still a Big Time. Saver already, though, let’s compare it to our current process.
First, we choose from our suite of benchmarks like say these ones, which is going to come down to what we’re trying to learn about the product. Does it perform well in lighter titles? What about the latest AAA games? What about older, DirectX 9 games that sort of thing, then we get everything installed and patched and adjust the in-game settings to our liking. Oh and don’t forget to reboot the game. If you happen to adjust that setting, then we fire up frame view set it for the length of the Benchmark go into the game, Run The Benchmark, wait for the game to load, then press the record button right at the exact right time as we load in And then we play the waiting game, it’s a very manual and tedious process. That requires just enough of our attention that it’s pretty hard to get any other real work done at the same time, but because mark bench has all of that built in once. It’S up and running you’re free to do whatever you want until each card has completed the entire test Suite want to test 20 games, easy want to repeat every test five times so that we can throw out the early cold runs and then add average.
The last three results, no problem, the other big difference maker is that mark bench all but eliminates human error and trust me once you’ve been at it for two four eight hours. It is really easy to forget a small step like opening up your background data logging. Software or to accidentally leave dlss enabled or something like that, and if you don’t notice until you’ve already moved on to another card, then those kinds of mistakes can cost a lot of time. Given that, in order to do things properly, you need to not only swap the cards but also remove and reinstall your GPU drivers in order to redo the run, and you might think this kind of thing affects me and not you guys, but here’s the thing whenever We post a review.
We invariably see questions like why didn’t you test this or how come nobody ever talks about that and we feel the same way. We want to know these answers, but in many cases our hands are tied. Companies like AMD and Intel send out review samples for their products with only seven to ten days until the Embargo lifts or four Intel. That means our testing needs to be done extremely quickly, so that we can analyze the data, write a script film, The Video Edit, the video and finally upload and release. All of that takes a lot of time that we don’t really have, meaning that we can narrow our scope, which sucks ask our employees to give up their precious time off, which really sucks or miss the Embargo which, as a business where views and clicks Drive income Is frankly unsustainable? Let’S look at some numbers to demonstrate that apple is a great example, since they don’t send us stuff ahead of release at all [ Laughter ]. This means that unless we can get an early hookup from somewhere, we can’t perform any meaningful tests on their products. If we want to get our videos out in a timely manner and to give you some idea why that is so important. Look at this. We rushed out a video on the M1 MacBook Pro on our short circuit sister channel right near the release day. It was super shallow because we had no time to prepare anything, but it got tripled the usual views that we see on that channel.
Then, when we covered that same product on our main Channel this one a few weeks later in way more depth, we ended up with whoops below average viewership for our trouble and again this isn’t just our problem. It’S a problem for consumers, because favoring friendly media is one of the best ways for companies to control the narrative around their products. That initial boost by being one of the first to cover a new device, often creates a positive feedback loop that continues to drive increased viewership over the entire sales cycle.
So if you do a search for say, M1 MacBook Pro review, you are much more likely to end up with an apple approved media outlet and the most Insidious part of this is that the companies that play this game well are smart enough to keep the Rules Of Engagement so vague and nebulous that they create this environment, where every media Outlet, even once they’ve, never spoken to, will carefully control their criticism. To avoid stepping over some invisible line, and this kind of horse is why we push back so hard when Nvidia threatened to stop sending pre-launch gpus to Tim and Steve from Hardware unboxed. Of course, as Nvidia pointed out, they are well within their rights to send gpus or not send gpus to whomever they please and besides they’re, more than welcome to cover their gpus later, except that, for the reasons I just outlined, this was a clear attempt to suppress Hardware and box influence and their growth by killing their launch day, viewership to nvidia’s credit, unlike Apple, they actually cared about the outrage from The Gaming Community who, to their credit, recognized this for what it was and to my knowledge, Hardware unboxed is reinstated in the reviewer Program, but there are many other companies who, like apple, maintain much more strict control over who is allowed to review their products, which is why it’s time to break that cycle and mark bench is the key. By automating. This testing we’re going to be able to piss off whoever we want and still deliver near launch day data to our viewers and over time we plan to publish not only videos but also written articles, which you can expect to find on the lab’s website. Along with the mother of all testing databases, obviously none of that is ready. But in the meantime, we’re going to have much more in-depth testing in our regular videos and we’re hoping to publish some extra data or content on our forums or on floatplane.com.
We’Re not 100 sure what this is going to look like yet, but you should sign up, for both our forum is free and the link below will be a thread where you can submit your suggestions for markbench features and as for float plane, it’s got great extras And exclusives right now, like Dennis’s epic, martial arts training sessions leading up to our fight. The only thing I need to do now is uh use it to get all that testing finished and compiled for the 20 mining GPU video and oh. I guess I also need to find a way to segue to our sponsor Squarespace. If you’re building your brand online in 2022, you should absolutely have a website and if you need a tool to help build that brand. Look no further than Squarespace Squarespace is the all-in-one platform to help expand your brand online. Make a beautiful website, engage with your audience and sell anything and everything from products to content.
We love Squarespace, so much. We use it here at lmg. It’S custom templates make it easy to stand out with a beautiful website that fits your needs. You can maximize your visibility thanks to a suite of integrated SEO features and their analytic insights help you optimize for performance, so you can see what’s going well and What needs a little work so get started today and head to squarespace.com forward, slash LTT to get 10 Off your first purchase, if you guys enjoyed this video, why don’t you check out our Labs video about our headphone testing device that believe it or not? We are still waiting to get delivered.
That was a rental unit we paid for it months ago, cool .