Often one finds oneself needing to parse HTML. Back in the day, we used regexes, and smoked inside. We didn't even know about caveman coders back then. Later, we'd use SimpleHtmlDom and mostly just swore when things didn't quite work as expected. But now, we can use PHP's DomDocument, and in Drupal we create them using Drupal's HTML utility.
At the top of your file place:
use Drupal\Component\Utility\Html;
Now, I'm looking for an iframe embed in my HTML, and I want the src, width, and height. I've handed it a text string that looks like this:
<iframe width="900" height="800" frameborder="0" scrolling="no" src="http://myurl.com"></iframe>
Then, in my hook_field_preprocess()
I do this.
foreach (Html::load($my_html)->getElementsByTagName('iframe') as $iframe) { $variables['src'] = $iframe->getAttribute('src'); $width = $iframe->getAttribute('width'); $height = $iframe->getAttribute('height'); }
Let's break this down. Html::load($my_html)
returns a DomDocument. getElementsByTagName('iframe')
returns a DOMNodeList which is iterable, and then you've got a DOMElement, which you can get the properties you need from, using the function getAttribute()
.
There. Beats regexes and patriarchy, doesn't it?