Connect with us


How DALL-E 2 could solve major computer vision challenges



How DALL-E 2 could solve major computer vision challenges

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

OpenAI has recently released DALL-E 2, a more advanced version of DALL-E, an ingenious multimodal AI capable of generating images purely based on text descriptions. DALL-E 2 does that by employing advanced deep learning techniques that improve the quality and resolution of the generated images and provides further capabilities such as editing an existing image, or creating new versions of it.

Many AI enthusiasts and researchers tweeted about how amazing DALL-E 2 is at generating art and images out of a thin word, yet in this article I’d like to explore a different application for this powerful text-to-image model — generating datasets to solve computer vision’s biggest challenges.

Caption: A DALL-E 2 generated image. “A rabbit detective sitting on a park bench and reading a newspaper in a Victorian setting.” Source: Twitter

Computer vision’s shortcomings

Computer vision AI applications can vary from detecting benign tumors in CT scans to enabling self-driving cars. Yet what is common to all is the need for abundant data. One of the most prominent performance predictors of a deep learning algorithm is the size of the underlying dataset it was trained on. For example, the JFT dataset, which is an internal Google dataset used for the training of image classification models, consists of 300 million images and more than 375 million labels.

Consider how an image classification model works: A neural network transforms pixel colors into a set of numbers that represent its features, also known as the “embedding” of an input. Those features are then mapped to the output layer, which contains a probability score for each class of images the model is supposed to detect. During training, the neural network tries to learn the best feature representations that discriminate between the classes, e.g. a pointy ear feature for a Dobermann vs. a Poodle.

Ideally, the machine learning model would learn to generalize across different lighting conditions, angles, and background environments. Yet more often than not, deep learning models learn the wrong representations. For example, a neural network might deduce that blue pixels are a feature of the “frisbee” class because all the images of a frisbee it has seen during training were on the beach.

One promising way of solving such shortcomings is to increase the size of the training set, e.g. by adding more pictures of frisbees with different backgrounds. Yet this exercise can prove to be a costly and lengthy endeavor. 

First, you would need to collect all the required samples, e.g. by searching online or by capturing new images. Then, you would need to ensure each class has enough labels to prevent the model from overfitting or underfitting to some. Lastly, you would need to label each image, stating which image corresponds to which class. In a world where more data translates into a better-performing model, these three steps act as a bottleneck for achieving state-of-the-art performance.

But even then, computer vision models are easily fooled, especially if they are being attacked with adversarial examples. Guess what is another way to mitigate adversarial attacks? You guessed right — more labeled, well-curated, and diverse data.

Caption: OpenAI’s CLIP wrongly classified an apple as an iPod due to a textual label. Source: OpenAI

Enter DALL-E 2

Let’s take an example of a dog breed classifier and a class for which it is a bit harder to find images — Dalmatian dogs. Can we use DALL-E to solve our lack-of-data problem?

Consider applying the following techniques, all powered by DALL-E 2:

  • Vanilla use. Feed the class name as part of a textual prompt to DALL-E and add the generated images to that class’s labels. For example, “A Dalmatian dog in the park chasing a bird.”
  • Different environments and styles. To improve the model’s ability to generalize, use prompts with different environments while maintaining the same class. For example, “A Dalmatian dog on the beach chasing a bird.” The same applies to the style of the generated image, e.g. “A Dalmatian dog in the park chasing a bird in the style of a cartoon.”
  • Adversarial samples. Use the class name to create a dataset of adversarial examples. For instance, “A Dalmatian-like car.”
  • Variations. One of DALL-E’s new features is the ability to generate multiple variations of an input image. It can also take a second image and fuse the two by combining the most prominent aspects of each. One can then write a script that feeds all of the dataset’s existing images to generate dozens of variations per class.
  • Inpainting. DALL-E 2 can also make realistic edits to existing images, adding and removing elements while taking shadows, reflections, and textures into account. This can be a strong data augmentation technique to further train and enhance the underlying model.

Except for generating more training data, the huge benefit from all of the above techniques is that the newly generated images are already labeled, removing the need for a human labeling workforce.

While image generating techniques such as generative adversarial networks (GAN) have been around for quite some time, DALL-E 2 differentiates in its 1024×1024 high-resolution generations, its multimodality nature of turning text into images, and its strong semantic consistency, i.e. understanding the relationship between different objects in a given image.

Automating dataset creation using GPT-3 + DALL-E

DALL-E’s input is a textual prompt of the image we wish to generate. We can leverage GPT-3, a text generating model, to generate dozens of textual prompts per class that will then be fed into DALL-E, which in turn will create dozens of images that will be stored per class.

For example, we could generate prompts that include different environments for which we would like DALL-E to create images of dogs.

Caption: A GPT-3 generated prompt to be used as input to DALL-E . Source: author

Using this example, and a template-like sentence such as “A [class_name] [gpt3_generated_actions],” we could feed DALL-E with the following prompt: “A Dalmatian laying down on the floor.” This can be further optimized by fine-tuning GPT-3 to produce dataset captions such as the one in the OpenAI Playground example above.

To further increase confidence in the newly added samples, one can set a certainty threshold to select only the generations that have passed a specific ranking, as every generated image is being ranked by an image-to-text model called CLIP.

Limitations and mitigations

If not used carefully, DALL-E can generate inaccurate images or ones of a narrow scope, excluding specific ethnic groups or disregarding traits that might lead to bias. A simple example would be a face detector that was only trained on images of men. Moreover, using images generated by DALL-E might hold a significant risk in specific domains such as pathology or self-driving cars, where the cost of a false negative is extreme.

DALL-E 2 still has some limitations, with compositionality being one of them. Relying on prompts that, for example, assume the correct positioning of objects might be risky.

Caption: DALL-E still struggles with some prompts. Source: Twitter

Ways to mitigate this include human sampling, where a human expert will randomly select samples to check for their validity. To optimize such a process, one can follow an active-learning approach where images that got the lowest CLIP ranking for a given caption are prioritized for a review.

Final words

DALL-E 2 is yet another exciting research result from OpenAI that opens the door to new kinds of applications. Generating huge datasets to address one of computer vision’s biggest bottlenecks–data is just one example.

OpenAI signals it will release DALL-E sometime during this upcoming summer, most likely in a phased release with a pre-screening for interested users. Those who can’t wait, or who are unable to pay for this service, can tinker with open source alternatives such as DALL-E Mini (Interface, Playground repository).

While the business case for many DALL-E-based applications will depend on the pricing and policy OpenAI sets for its API users, they are all certain to take image generation one big leap forward.

Sahar Mor has 13 years of engineering and product management experience focused on AI products. He is currently a Product Manager at Stripe, leading strategic data initiatives. Previously, he founded AirPaper, a document intelligence API powered by GPT-3 and was a founding Product Manager at Zeitgold (Acq. By Deel), a B2B AI accounting software company where he built and scaled its human-in-the-loop product, and Levity.ai, a no-code AutoML platform. He also worked as an engineering manager in early-stage startups and at the elite Israeli intelligence unit, 8200.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Go to Source

Click to comment

Leave a Reply


AMD CEO says 5-nm Zen 4 processors coming this fall



Did you miss a session from GamesBeat Summit 2022? All sessions are available to stream now. Watch now.

Advanced Micro Devices revealed its 5-nanometer Zen 4 processor architecture today at the Computex 2022 event in Taiwan.

The new AMD Ryzen 7000 Series desktop processors with Zen 4 cores will be coming this fall, said Lisa Su, CEO of AMD, in a keynote speech.

Su said the new processors with Zen 4 architecture will deliver a significant increase in performance upon their launch in the fall of 2022. Additionally, Su highlighted the strong growth and momentum for AMD in the mobile market as 70 of the more than 200 expected ultrathin, gaming and commercial notebook designs powered by Ryzen 6000 Series processors have been launched or announced to-date.

In addition, other AMD executives announced the newest addition to the Ryzen Mobile lineup, “Mendocino;” the newest AMD smart technology, SmartAccess Storage; and more details of the new AM5 platform, including support from leading motherboard manufacturers.

“At Computex 2022 we highlighted growing adoption of AMD in ultrathin, gaming, and commercial notebooks from the leading PC providers based on the leadership performance and battery life of our Ryzen 6000 series mobile processors,” said Su. “With our upcoming AMD Ryzen 7000 Series desktop processors, we will bring even more leadership to the desktop market with our next-generation 5-nm Zen 4 architecture and provide an unparalleled, high-

performance computing experience for gamers and creators.”

AMD Ryzen 7000 Series desktop processors

The new Ryzen 7000 Series desktop processors will double the amount of L2 cache per core, feature higher clock speeds, and are projected to provide greater than 15% uplift in single-thread performance versus the prior generation, for a better desktop PC experience.

During the keynote, a pre-production Ryzen 7000 Series desktop processor was demonstrated running at 5.5 GHz clock speed throughout AAA game play. The same processor was also demonstrated performing more than 30% faster than an Intel Core i9 12900K in a Blender multi-threaded rendering workload.

In addition to new “Zen 4” compute dies, the Ryzen 7000 series features an all-new 6nm I/O die. The new I/O die includes AMD RDNA 2-based graphics engine, a new low-power architecture adopted from AMD Ryzen mobile processors, support for the latest memory and connectivity technologies like DDR5 and PCI Express 5.0, and support for up to four displays.

AMD Socket AM5 Platform

The new AMD Socket AM5 platform provides advanced connectivity for our most demanding enthusiasts. This new socket features a 1718-pin LGA design with support for up to 170W TDP processors, dual-channel DDR5 memory, and new SVI3 power infrastructure for leading all-core performance with our Ryzen 7000 Series processors. AMD Socket AM5 features the most PCIe 5.0 lanes in the industry with up to 24 lanes, making it our fastest, largest, and most expansive desktop platform with support for the next-generation and beyond class of storage and graphics cards.

And AMD said the “Mendocino” processors will offer great everyday performance and are expected to be priced from $400 to $700.

Featuring “Zen 2” cores and RDNA 2 architecture-based graphics, the processors are designed to deliver the best battery life and performance in the price band so users can get the most out of their laptop at an attractive price.

The first systems featuring the new “Mendocino” processors will be available from computer partners in Q4 2022.

GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Learn more about membership.

Go to Source

Continue Reading


AMD’s Ryzen 7000 desktop chips are coming this fall with 5nm Zen 4 cores



AMD’s Ryzen 7000 desktop chips are coming this fall with 5nm Zen 4 cores

AMD’s upcoming Ryzen 7000 chips will mark another major milestone for the company: they’ll be the first desktop processors running 5 nanometer cores. During her Computex keynote presentation today, AMD CEO Lisa Su confirmed that Ryzen 7000 chips will launch this fall. Under the hood, they’ll feature dual 5nm Zen 4 cores, as well as a redesigned 6nm I/O core (which includes RDNA2 graphics, DDR5 and PCIe 5.0 controllers and a low-power architecture). Earlier this month, the company teased its plans for high-end “Dragon Range” Ryzen 7000 laptop chips, which are expected to launch in 2023.

Since this is just a Computex glimpse, AMD isn’t giving us many other details about the Ryzen 7000 yet. The company says it will offer a 15 percent performance jump in Cinebench’s single-threaded benchmark compared to the Ryzen 5950X. Still, it’d be more interesting to hear about multi-threaded performance, especially given the progress Intel has made with its 12th-gen CPUs. You can expect 1MB of L2 cache per core, as well as maximum boost speeds beyond 5GHz and better hardware acceleration for AI tasks.

AMD is also debuting Socket AM5 motherboards alongside its new flagship processor. The company is moving towards a 1718-pin LGA socket, but it will still support AM4 coolers. That’s a big deal if you’ve already invested a ton into your cooling setup. The new motherboards will offer up to 24 channels of PCIe 5.0 split across storage and graphics, up to 14 USB SuperSpeed ports running at 20 Gbps, and up to 4 HDMI 2.1 and DisplayPort 2 ports. You’ll find them in three different flavors: B650 for mainstream systems, X650 for enthusiasts who want PCIe 5.0 for storage and graphics and X650 Extreme for the most demanding folks.

Given that Intel still won’t have a 7nm desktop chip until next year (barring any additional delays), AMD seems poised to once again take the performance lead for another generation. But given just how well Intel’s hybrid process for its 12th-gen chips has worked out, it’ll be interesting to see how it plans to respond. If anything, it sure is nice to see genuine competition in the CPU space again.

While Ryzen 7000 will be AMD’s main focus for the rest of the year, the company is also throwing a bone to mainstream laptops in the fourth quarter with its upcoming 6nm “Mendocino” CPUs. They’ll sport four 6nm Zen 2 cores, as well as RDNA 2 graphics, making them ideal for systems priced between $399 and $699. Sure, that’s not much to get excited about, but even basic machines like Lenovo’s Ideapad 1 deserve decent performance. And for many office drones, it could mean having work-issued machines that finally don’t stink.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Go to Source

Continue Reading


Disney’s Disney+ ad pitch reflects how streaming ad prices set to rise in this year’s upfront



Disney’s Disney+ ad pitch reflects how streaming ad prices set to rise in this year’s upfront

With Disney+, Disney is looking to set a new high-water mark for ad prices among the major ad-supported streamers. The pricey pitch is representative of a broader rising tide in streaming ad pricing in this year’s TV advertising upfront market, as Disney-owned Hulu, Amazon and even Fox’s Tubi are looking to press upfront advertisers to pay up.

In its initial pitch to advertisers and their agencies, Disney is seeking CPMs for Disney+ around $50, according to agency executives. That price point applies to broad-based targeting dubbed “P2+,” which refers to an audience of any viewer who is two years old or older (though Disney has told agency executives that programming aimed at viewers seven years old and younger will be excluded from carrying ads). In other words, more narrowly targeted ads are expected to cost more based on the level of targeting. A Disney spokesperson declined to comment.

At a $50 CPM, Disney+ is surpassing the prices that NBCUniversal’s Peacock  and Warner Bros. Discovery’s HBO Max sought in last year’s upfront market and that gave ad buyers sticker shock. The former sought CPMs in the $30 to $40 range, while the latter sought $40+ CPMs. By comparison, other major ad-supported streamers like Hulu, Discovery+ and Paramount+ were charging low-to-mid $20 CPMs that major ad-supported streamers charge. As a result, Peacock’s and HBO Max’s asks ended up being price prohibitive, with some advertisers limiting the amount of money they spent with the streamers because of their higher rates.

Unsurprisingly, agency executives are balking at Disney+’s price point. “They’re citing pricing that no longer exists, meaning Peacock and HBO Max recognized they came out too high and they’re reducing it. Disney+ is using earmuffs to pretend that second part didn’t happen,” said one agency executive.

However, Disney+ isn’t the only streamer seeking to raise the rates that ad buyers are accustomed to paying. Hulu is also seeking to increase its prices in this year’s upfront, with P2+ pricing going from a $20-$25 CPM average to averaging in the $25-$30 CPM range, according to agency executives. And during a call with reporters on May 16, Fox advertising sales president Marianne Gambelli said that the company will seek higher prices for its free, ad-supported streaming TV service Tubi in this year’s upfront market. It’s unclear what Tubi’s current rates are, but FAST services’ CPMS are typically in the low to mid teens, said the agency executives.

“We have to get the value for Tubi. Tubi has grown to a point — it’s doubled, tripled in size over the past couple of years. So we are going to obviously make that a priority and look for not only more volume but price,” Gambelli said.

Meanwhile, in pitching its Thursday Night Football package that will be streamed on Amazon Prime Video and Twitch, Amazon has been pressing for a premium on what Fox charged advertisers last year, according to agency executives. The e-commerce giant will be handling the games’ ad placements like traditional TV, meaning that it will run the same ad in each ad slot for every viewer as opposed to dynamically inserting targeted ads. “It’s streaming broadcast,” said a second agency executive.

An Amazon spokesperson declined to comment on pricing but did provide a general statement. “Thursday Night Football on Prime Video and Twitch is a purely digital broadcast, and we’re excited to bring fans a new viewing experience. There are 80MM active Prime Video households in the U.S. and, in a survey of our 2021 TNF audience, 38% reported they don’t have a pay-TV service – meaning TNF on Prime Video and Twitch enables brands to connect with cord-cutters and cord-nevers. Brands can also reach these viewers beyond TNF. Our first-party insights enable them to reengage TNF audiences across Amazon, such as in Freevee content.”

One of the agency executives that Digiday spoke to said the latest ask is for a plus-10% increase on Fox’s rates, though what Fox’s rates were are unclear and other agency executives said the premium that Amazon is asking for varies. Ad Age reported in February that Amazon was seeking up to 20% higher prices than Fox’s rates. “I don’t know if it is consistently plus-10, but it is definitely more. Which is crazy because Fox couldn’t make money on it, which is why they gave it up for this fall,” said a second agency executive.

“Someone was eating way too many gummies before they put the pricing together,” said a second agency executive of Amazon’s Thursday Night Football pitch.

Ad-supported streaming service owners also see an opportunity to push for higher prices as advertisers to adopt more advanced targeting with their streaming campaigns, such as by using the media companies’ and/or advertisers’ first-party data to aim their ads on the streamers. 

Said one TV network executive, “You’ll see premiums, especially as it relates to advertisers that really want to hook into [their company’s streaming service] and buy those targeted audiences across the platform and either use [the TV network’s] first-party data or bring their own data to the table. That’s the biggest business we’re in, and that’s where we see great growth from a pricing standpoint.”


Go to Source

Continue Reading
Home | Latest News | Tech | How DALL-E 2 could solve major computer vision challenges