I’ve been building an express server to filter our crawlers from normal users to display very specific meta information. I’ve got a very large crawler list (JSON) that I use pattern matching on to determine if the user agent is a bot or not.
After testing I’ve got nearly all platforms set up, but iMessage is throwing me off. When I paste a url into iMessage and press send in my logs I see two different user agents appearing
com.apple.WebKit.Networking/8615. CFNetwork/1410.0.3 Darwin/22.6.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0
How can I implement this user agent into my master list while ensuring I’m not going to server crawler info to normal users and visa-versa?
Here’s how I’m detecting crawlers:
const isCrawler = (userAgent) => {
return crawlers.some(crawler => new RegExp(crawler, 'i').test(userAgent));
and here’s an example of one of my many items
"pattern": "Applebot",
"url": "http://www.apple.com/go/applebot",
"addition_date": "2015/04/15",
"instances": [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1)",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)",
"Mozilla/5.0 (compatible; Applebot/0.3; +http://www.apple.com/go/applebot)",
"Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Applebot/0.3; +http://www.apple.com/go/applebot)",
"Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B410 Safari/600.1.4 (Applebot/0.1; +http://www.apple.com/go/applebot)"
tldr; when I paste a link in iMessage I see a few requests to my express server, with two different user-agents. neither are triggering the correct meta information to be served.
it’s probably working when the first request comes in, but the second user-agent is probably the issue. I guess my question is could I just also look for something like:com.apple.WebKit.Networking/8615. CFNetwork/1410.0.3 Darwin/22.6.0
and if it’s found can I assume this is just a crawler, and not a human.