Parse HTML in Drupal 8 to Get Attributes

Created on July 21, 2017 by Jeremiah John

Tags: Drupal 8, HTML Scraping, DOM, Planet Drupal

Often one finds oneself needing to parse HTML. Back in the day, we used regexes, and smoked inside. We didn't even know about caveman coders back then. Later, we'd use SimpleHtmlDom and mostly just swore when things didn't quite work as expected. But now, we can use PHP's DomDocument, and in Drupal we create them using Drupal's HTML utility.

At the top of your file place:
use Drupal\Component\Utility\Html;

Now, I'm looking for an iframe embed in my HTML, and I want the src, width, and height. I've handed it a text string that looks like this:
<iframe width="900" height="800" frameborder="0" scrolling="no" src="http://myurl.com"></iframe>

Then, in my hook_field_preprocess() I do this.

      foreach (Html::load($my_html)->getElementsByTagName('iframe') as $iframe) {
        $variables['src'] = $iframe->getAttribute('src');
        $width = $iframe->getAttribute('width');
        $height = $iframe->getAttribute('height');
      }

Let's break this down. Html::load($my_html) returns a DomDocument. getElementsByTagName('iframe') returns a DOMNodeList which is iterable, and then you've got a DOMElement, which you can get the properties you need from, using the function getAttribute().

There. Beats regexes and patriarchy, doesn't it?

View the discussion thread.

About the Author

Hi. My name is Jeremiah John. I'm a sf/f writer and activist.

I just completed a dystopian science fiction novel. I run a website which I created that connects farms with churches, mosques, and synagogues to buy fresh vegetables directly and distribute them on a sliding scale to those in need.

In 2003, I spent six months in prison for civil disobedience while working to close the School of the Americas, converting to Christianity, as one does, while I was in the clink.

Futurism—through the looking-glass: reflections of a science fiction and fantasy author.

Parse HTML in Drupal 8 to Get Attributes

{ tech blog }

About the Author