Skip to content

Conversation

@shtse8
Copy link

@shtse8 shtse8 commented Feb 27, 2017

Due to the urgent need in my project, hope it can help. #18

@shtse8 shtse8 changed the title Try to fix UTF-8 problem. Fix utf-8 encoding problem Feb 28, 2017
@ventrec
Copy link

ventrec commented Oct 5, 2017

I've ran into the same issue myself now, and this fix would be highly apprectiated.

@wasinger Is it possible to get this merged?

@shtse8
Copy link
Author

shtse8 commented Oct 6, 2017

I finally gave up this project and wrote my own html dom parse. But I am new on starting up a open source project.

@kukungkung
Copy link

I think it can help you.

curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');

@shtse8
Copy link
Author

shtse8 commented Oct 24, 2017

@kukungkung setting curlopt is just try to request with encoding utf-8, you have to decode the utf8 yourself. and the response may not follow your encoding. Also, mostly site are sending utf8 to you. here, the main problem is, htmlpagedom parse cannot support utf8, but not the curl.

@glensc
Copy link
Contributor

glensc commented Jun 11, 2018

the problem is not that utf8 is not parsed, just that result is encoded with html entities.

i solved this in my code:

$html = html_entity_decode((string)$crawler, ENT_NOQUOTES, 'UTF-8');

@glensc glensc mentioned this pull request Jun 11, 2018
@glensc
Copy link
Contributor

glensc commented Jun 11, 2018

the PRs seems broken because created from @shtse8 master branch, thus changes from #19 and #20 mixed in both pull requests. and perhaps even changes not related to neither of the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants